本文主要是介绍EnhanceIO的Readme介绍,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
STEC EnhanceIO SSD Caching Software
25th December, 2012
1. WHAT IS ENHANCEIO?
什么是EnhanceIO?
EnhanceIO driver is based on EnhanceIO SSD caching software product
developed by STEC Inc. EnhanceIO was derived from Facebook's open source
Flashcache project. EnhanceIO uses SSDs as cache devices for
traditional rotating hard disk drives (referred to as source volumes
throughout this document).
EnhanceIO驱动基于STEC开发的EnhanceIO SSD缓存软件。EnhanceIO衍生于
Facebook的开源软件Flashcache项目。EnhanceIO使用SSD作为传统硬盘驱动
(在本文中会一直用“源卷”来描述该词)的缓存设备。
EnhanceIO can work with any block device, be it an entire physical
disk, an individual disk partition, a RAIDed DAS device, a SAN volume,
a device mapper volume or a software RAID (md) device.
EnhanceIO可以和任何的块设备一起工作,如一个完整的硬盘、一个硬盘分区、
一个做了raid的DAS设备,一个SAN卷,一个device mapper卷,或者一个软RAID设备。
The source volume to SSD mapping is a set-associative mapping based on
the source volume sector number with a default set size
(aka associativity) of 512 blocks and a default block size of 4 KB.
Partial cache blocks are not used. The default value of 4 KB is chosen
because it is the common I/O block size of most storage systems. With
these default values, each cache set is 2 MB (512 * 4 KB). Therefore,
a 400 GB SSD will have a little less than 200,000 cache sets because a
little space is used for storing the meta data on the SSD.
SSD映射的源卷是组合相关映射,集合默认大小为512个块并且块大小4KB。
没有使用局部缓存。选择默认值4KB是因为很多存储系统的通用IO块大小也是4KB。
使用默认值,每个缓存集大小为2MB(512 * 4KB)。因此,一个400GB的SSD有略少于
20万个缓存集,因为一小部分空间用于存储SSD的元数据。
EnhanceIO supports three caching modes: read-only, write-through, and
write-back and three cache replacement policies: random, FIFO, and LRU.
EnhanceIO支持三种缓存模式:只读,透写和回写。支持三种缓存替换策略:随机、
FIFO和LRU。
Read-only caching mode causes EnhanceIO to direct write IO requests only
to HDD. Read IO requests are issued to HDD and the data read from HDD is
stored on SSD. Subsequent Read requests for the same blocks are carried
out from SSD, thus reducing their latency by a substantial amount.
只读模式使得EnhanceIO将写IO只导向给HDD。读IO请求发给HDD,从HDD读到的数据
存储到SDD中。后来的位于相同块的读请求就直接在SSD上执行,因此大量数据时就
会减少延迟。
In Write-through mode - reads are handled similar to Read-only mode.
Write-through mode causes EnhanceIO to write application data to both
HDD and SSD. Subsequent reads of the same data benefit because they can
be served from SSD.
在透写模式中,读请求与只读模式的处理类似。透写模式使得EnhanceIO将写应用数据
写到HDD和SSD中。由于可以从SSD中获取到数据,后来的读请求就可以受益。
Write-back improves write latency by writing application requested data
only to SSD. This data, referred to as dirty data, is copied later to
HDD asynchronously. Reads are handled similar to Read-only and
Write-through modes.
回写模式将应用请求数据只写到SSD,可以降低写的延迟。这些数据,可以认为是脏数据,
会被异步的拷贝到HDD中。读请求处理方式和只读、透写模式类似。
2. WHAT HAS ENHANCEIO CHANGED TO FLASHCACHE?
EnhanceIO与FlasnCache比,改变了什么?
2.1. A new write-back engine
新的回写引擎
The write-back engine in EnhanceiO has been designed from scratch.
Several optimizations have been done. IO completion guarantees have
been improved. We have defined limits to let a user control the amount
of dirty data in a cache. Clean-up of dirty data is stopped by default
under a high load; this can be overridden if required. A user can
control the extent to which a single cache set can be filled with dirty
data. A background thread cleans-up dirty data at regular intervals.
Clean-up is also done at regular intevals by identifying cache sets
which have been written least recently.
EnhanceIO的回写引擎是从头设计的:
做了一些优化。提升了IO完成率。我们定义了一些限制,让用户控制cache中脏数据量。
默认情况下,高负载时脏数据清理会被关闭;如果有需要,这部分可以重写。用户可以
控制单个缓存集可以被脏数据填充的范围。一个后台线程会定期的清理脏数据。在确认
缓存器最近最少被写时,也会定期地进行清理工作。
2.2. Transparent cache
透明缓存
EnhanceIO does not use device mapper. This enables creation and
deletion of caches while a source volume is being used. It's possible
to either create or delete cache while a partition is mounted.
EnhanceIO不使用device mapper。这使得可以在一个源卷正在被使用时,也可以创建
和删除缓存。也使得即使一个分区被mount,也可以创建和删除缓存。
EnhanceIO also supports creation of a cache for a device which contains
partitions. With this feature it's possible to create a cache without
worrying about having to create several SSD partitions and many
separate caches.
EnhanceIO也支持为包含分区的设备创建缓存。有了这个特性,就可以创建缓存,
而不必再创建一些SSD分区和很多分离的缓存了。
2.3. Large I/O Support
大I/O支持
Unlike Flashcache, EnhanceIO does not cause source volume I/O requests
to be split into cache block size pieces. For the typical SSD cache
block size of 4 KB, this means that a write I/O request size of, say,
64 KB to the source volume is not split into 16 individual requests of
4 KB each. This is a performance improvement over Flashcache. IO
codepaths have been substantially modified for this improvement.
和FlashCache不同,EnhanceIO不会导致源卷的I/O请求被切割成缓存中块大小的条带了。
对于典型的4KB块大小的SSD缓存,这意味着一个64KB大小的写I/O请求,不会被切割成16个
4KB大小的独立请求。这是由于FlashCache的性能提升。为了这个提升,IO的代码路径
大体上都被修改了。
2.4. Small Memory Footprint
小内存
Through a special compression algorithm, the meta data RAM usage has
been reduced to only 4 bytes for each SSD cache block (versus 16 bytes
in Flashcache). Since the most typical SSD cache block size is 4 KB,
this means that RAM usage is 0.1% (1/1000) of SSD capacity.
For example, for a 400 GB SSD, EnhanceIO will need only 400 MB to keep
all meta data in RAM.
通过一种特殊的压缩算法,RAM中元数据减少到了每个SSD缓存块只需要4个字节
(对比FlashCache需要16字节)。既然大多数SSD缓存块大小是4KB,这意味着RAM
利用SSD容量的0.1%(1/1000)。比如说,对于一个400GB的SSD,EnhanceIO只需要
400MB的内存来保存所有元数据。
For an SSD cache block size of 8 KB, RAM usage is 0.05% (1/2000) of SSD
capacity.
如果SSD缓存块大小是8KB,RAM使用空间为SSD容量的0.05(1/2000)。
The compression algorithm needs at least 32,768 cache sets
(i.e., 16 bits to encode the set number). If the SSD capacity is small
and there are not at least 32,768 cache sets, EnhanceIO uses 8 bytes of
RAM for each SSD cache block. In this case, RAM usage is 0.2% (2/1000)
of SSD capacity for a cache block size of 4K.
该压缩算法需要最少32,678个缓存集(16位来加密集合个数)。如果SSD容量很小,
并且不足32,678个缓存集合,那么对于每个SSD缓存块,EnhanceIO需要使用8字节的
内存。假若这样,如果一个缓存块大小为4K时,RAM使用SSD缓存容量的0.2%(2/1000)。
2.5. Loadable Replacement Policies
可加载替换策略
Since the SSD cache size is typically 10%-20% of the source volume
size, the set-associative nature of EnhanceIO necessitates cache
block replacement.
既然SSD缓存大小占源卷的10%-20%,EnhanceIO的组合相关映射特性必须要
缓存块替换。
The main EnhanceIO kernel module that implements the caching engine
uses a random (actually, almost like round-robin) replacement policy
that does not require any additional RAM and has the least CPU
overhead. However, there are two additional kernel modules that
implement FIFO and LRU replacement policies. FIFO is the default cache
replacement policy because it uses less RAM than LRU. The FIFO and LRU
kernel modules are independent of each other and do not have to be
loaded if they are not needed.
主EnhanceIO内核模块实现了缓存引擎,它使用了随机(其实,更应该是轮训)替换
策略,并不要求额外的RAM,CPU负载也最低。此外,还有两个额外的内核模块,实现
FIFO和LRU替换策略。FIFO是默认的缓存替换策略,因为较LRU,它使用较少的LRU。
FIFO和LRU内核模块各自独立,并且如果不需要时它们不会被加载。
Since the replacement policy modules do not consume much RAM when not
used, both modules are typically loaded after the main caching engine
is loaded. RAM is used only after a cache has been instantiated to use
either the FIFO or the LRU replacement policy.
既然替换策略模块在不使用时不消耗很多RAM,那么在主缓存引擎加载之后,这
两个模块就被加载了。只有在缓存使用FIFO或者LRU替换策略实例化后,RAM才会
被使用。
Please note that the RAM used for replacement policies is in addition
to the RAM used for meta data (mentioned in Section 2.1). The table
below shows howmuch RAM each cache replacement policy uses:
注意:替换策略中的RAM使用是除开元数据使用的内存之外的,下图显示各种策略使用RAM情况。
POLICYRAM USAGE
---------------
Random0
FIFO4 bytes per cache set
LRU4 bytes per cache set + 4 bytes per cache block
2.6. Optimal Alignment of Data Blocks on SSD
SSD上数据块的最佳对齐
EnhanceIO writes all meta data and data blocks on 4K-aligned blocks
on the SSD. This minimizes write amplification and flash wear.
It also improves performance.
EnhanceIO在SSD上写所有元数据和数据块都按照4K边界对齐,这最小化了写入放大
和flash损耗。它也提升了性能。
2.7. Improved device failure handling
提升了设备故障控制
Failure of an SSD device in read-only and write-through modes is
handled gracefully by allowing I/O to continue to/from the
source volume. An application may notice a drop in performance but it
will not receive any I/O errors.
只读和透写模式下,SSD设备故障可以很优雅地处理,通过允许I/O来继续读/写
源卷。一个应用可能会注意到性能的下降,但是不会获取到任何I/O错误。
Failure of an SSD device in write-back mode obviously results in the
loss of dirty blocks in the cache. To guard against this data loss, two
SSD devices can be mirrored via RAID 1.
回写模式下,SSD故障明显地导致缓存中脏数据的丢失。为了防止数据丢失,两块SSD
设备可以通过做RAID1。
EnhanceIO identifies device failures based on error codes. Depending on
whether the failure is likely to be intermittent or permanent, it takes
the best suited action.
EnhanceIO识别设备基于错误码的设备故障。取决于故障是间歇性的还是持久性的,
它会选择最合适的措施。
2.8. Coding optimizations
代码优化
Several coding optizations have been done to reduce CPU usage. These
include removing queues which are not required for write-through and
read-only cache modes, splitting of a single large spinlock, and more.
Most of the code paths in flashcache have been substantially
restructured.
部分代码优化,用于减少CPU使用率。包括减少透写和只读模式下的队列,将一个
大的自旋锁拆分开。FlashCache中的很多代码都被重新组织了。
2.9 Sequential I/O bypass
连续I/Obypass
EnhanceIO has removed the bypass of sequential IO available in flashcache.
The sequential detection logic has a limited use case, espescially in a
reasonably multithreaded scenario.
EnhanceIO去除了FlashCache中连续IO的bypass,这个连续探测逻辑有一个受限的
用例,尤其是在合理的多线程程序中。
3. EnhanceIO usage
EnhanceIO用法
3.1. Cache creation, deletion and editing properties
缓存创建、删除和编辑
eio_cli utility is used for creating and deleting caches and editing
their properties. Manpage for this utility eio_cli(8) provides more
information.
eio_cli功能用于创建、删除缓存和编辑它们的属性。eio_cli功能的Manpange
提供更多的信息。
3.2. Making a cache configuration persistent
持久化缓存配置
It's essential that a cache be resumed before any applications or a
filesystem use the source volume during a bootup. If a cache is enabled
after a source volume is written to, stale data may be present in the
cache. It may cause data corruption. The document Persistent.txt
describes how to enable a cache during bootup using udev scripts.
启动时,在任何应用或者文件系统使用源卷前,缓存需要被恢复,该功能是核心功能。
如果缓存在源卷被写入后才使能,旧数据就存在于缓存中,这可能会引起数据冲突。
文档Persistent.txt描述了如何在启动时使用udev脚本使能缓存。
In case an SSD does not come up during a bootup, it's ok to allow read
and write access to HDD only in the case of a Write-through or a
read-only cache. A cache should be created again when SSD becomes
available. If a previous cache configuration is resumed, it may cause
stale data to be read.
万一SSD在启动时没有被发现,可以使用透写或者只读模式,允许读写到HDD中。
缓存应该再SSD可用时再次创建。如果一个之前的缓存文件被恢复了,可能会导致
读取旧数据。
3.3. Using a Write-back cache
使用回写缓存
It's absolutely necessary to make a Write-back cache configuration
persistent. This is required particularly in the case of an OS crash or
a power failure. A Write-back cache may contain dirty blocks which
haven't been written to HDD yet. Reading the source volume without
enabling the cache will cause incorrect data to be read.
持久化回写配置文件是完全有必须要的,尤其是万一OS崩溃或者电源故障时。
回写cache可能包含还没有写到HDD中的脏数据,不是能缓存而直接从源卷读取
数据,会导致读取到错误的数据。
In case an SSD does not come up during a bootup, access to HDD should
stopped. It should be enabled only after SSD comes-up and a cache is
enabled.
万一启动时SSD没有被发现,那么需要停止对HDD的访问。在SSD被发现,并且缓存
启动后,才能使能访问HDD。
Write-back cache needs to perform clean operation in order to flush the
dirty data to the source device(HDD). The clean can be either trigerred
by the user or automatically initiated, based on preconfigured
thresholds. These thresholds are described below. They can be set using
sysctl calls.
回写缓存需要执行干净的操作,从而将脏数据刷到源设备(HDD)中。操作可以是
用户出发或者自动初始化完成,这取决于预配置的阈值。这些阈值如下描述,可
通过syscttl调用来设置。
a) Dirty high threshold (%) : The upper limit on percentage of dirty
blocks in the entire cache.
脏块高阈值:脏块在整个缓存中的最高上限百分比。
b) Dirty low threshold (%) : The lower limit on percentage of dirty
blocks in the entire cache.
脏块低阈值:脏块在整个缓存中的最低下限百分比。
c) Dirty set high threshold (%) : The upper limit on percentage of dirty
blocks in a set.
脏块集合高阈值:脏块在一个缓存集中的最高上限百分比。
d) Dirty set low threshold (%) : The lower limit on percentage of dirty
blocks in a set.
脏块集合低阈值:脏块在一个缓存集中的最低下限百分比。
e) Automatic clean-up threshold : An automatic clean-up of the cache
will occur only if the number of outstanding I/O requests from the
HDD is below the threshold.
自动清理阈值:只有来自HDD的I/O请求数在该阈值之下时,才会启动缓存自动清理。
f) Time based clean-up interval (minutes) : This option allows you to
specify an interval between each clean-up process.
清理间隔(分钟):该选项允许你指定两次清理操作的时间间隔。
Clean is trigerred when one of the upper thresholds or time based clean
threshold is met and stops when all the lower thresholds are met.
当其中的一个上限阈值或者时间阈值满足时,就会触发清理操作;当所有下线阈值
满足时,清理就会停止。
4. ACKNOWLEDGEMENTS
STEC acknowledges Facebook and in particular Mohan Srinivasan
for the design, development, and release of Flashcache as an
open source project.
这篇关于EnhanceIO的Readme介绍的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!