Research On Key Technologies Of SSD-based Multi-level Storage Architecture

Posted on:2014-05-31

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z G Chen

Full Text:PDF

GTID:1108330479979590

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The storage subsystem becomes increasingly important within the entire computer system. Hard disk-based storage systems cannot meet the demand of users in terms of performance, energy-efficiency and so on. In recent years, flash memory is advancing rapidly. As its density is increasing and the price is decreasing, flash-based SSDs(Solid State Drive) have been widely deployed in large-scale storage systems. However, the price of SSDs remains higher than that of hard disks, while the price of hard disks will be decreased further. SSDs are unlikely to absolutely replace hard disks in the near future. Instead, industrial communities usually take SSDs as a new hierarchy to incorporate them into existing storage systems. This thesis focuses on the new storage hierarchy and makes contributions in five aspects.(1) The Page to Block Mapping FTL with Low Response TimeFlash memory cannot be directly incorporated into existing storage systems due to its peculiarities in terms of out-of-place update, limited lifespan and so on. The commonplace solution is to encapsulate flash chips into an SSD via flash translation layer(FTL). FTL consists of address mapping, garbage collection, wear-leveling etc. Garbage collection usually imposes serious negative impact on performance. Existing FTLs reclaim a block via two operations, migration and erasure. The migration operation moves some pages that are still valid within the block. Then, the block is erased by erasure operation. These two operations are likely to congest userâ€™s I/O requests for a long time. The work proposes a page-to-block mapping FTL(PBFTL). PBFTL employs either migration operation or erasure operation, but rarely requires both two operations to reclaim a block. As a result, garbage collection is unlikely to congest userâ€™s requests seriously. Experimental results demonstrate that, PBFTL outperforms existing FTLs by more than 15% in terms of response time.(2) An SSD-aware cache replacement policySSDs have been widely deployed in existing storage systems, while the software components remain HDD-oriented, but are rarely optimized for SSDs. This work rethinks the cache replacement policy when underlying storage systems are SSDs and proposes a policy that is aware of the parallelisms existing in SSDs. The principle of the SSD-aware cache(SAC) is explained as follows. An SSD contains multiple channels, and the workloads among channels are usually unbalanced. I/O requests issued to the busy channels are likely to endure a long latency, while I/O requests issued to idle channels will be returned rapidly. SAC gives higher priority to replace pages from idle channels, because these pages can be rapidly retrieved once demanded. On the contrary, pages from busy channels are protected from being evicting out of cache. By this way, the average response time of all requests can be reduced. However, SSDs are black boxes. The cache is unaware of the inner configuration of SSDs, and cannot identify the pages from idle channels. So, another contribution of this work is a novel mechanism used to select pages from idles even though the inner configuration of SSDs is not revealed.(3) A hot page identification scheme used to extend the lifespan of SSD-based cacheThe lifespan of SSDs is limited. If all pages demanded by applications are kept in SSD-based cache, the limited lifespan will be deleted rapidly. But actually, a large amount of cold pages will not be re-accessed even though they are accepted by cache. This work prevents cold pages from entering into SSD-based cache depending on a new hot page identification scheme. As a result, the lifespan of SSDs is significantly enhanced. The fundament of the new hot page identification is a data structure called UCBF(Ultra Counting Bloom Filter). UCBF is able to compute the hotness for each page at the cost of limited memory overhead. The hot page identification scheme maintains a watermark. The page whose hotness is greater than the watermark is allowed to enter into SSD-based cache. The watermark can adapt to the workloads dynamically and continually. Experimental results demonstrate that, the hot page identification scheme helps to extend the lifespan of SSDs by 6 times, and the hit ratio achieved by cache is enhanced by more than 10%.(4) A memory-efficient data structure used to implement replacement policies for large cachesAn important feature of SSD-based cache is the large capacity. The replacement policy for so large a cache usually introduces excessive memory overhead. This work develops a new data structure used to implement LRU policy with limited memory overhead. As the LRU policy is the fundament of most complex cache replacement policies, all these policies can be implemented with limited memory overhead via the new data structure. The proposed data structure implements LRU policy via a FIFO queue and a bloom filter. The FIFO queue is kept in SSD thus introduces no memory overhead. The bloom filter is kept in memory, but is known to be memory-efficient. As a result, the combination of FIFO queue and bloom filter reduces the memory overhead of cache replacement polcy by about 10 times. The bloom filter required above should support element deletion, but existing bloom filters cannot support element deletion and remain memory-efficient. So, this work proposes a memory-efficient bloom filter that supports element deletion to serve for the new data structure used to implement cache replacement policies.(5) The data prefetching and cache replacement policies used to extend main memory by SSDsThe large-scale data processing usually requires a large capacity of main memory, while DRAM is not competent for building the large memory in terms of density, price and energy-efficiency. As SSDs are comparable with DRAM in terms of bandwidth and throughput, this work proposes to extend main memory by SSDs. However, the latency of SSDs is much higher than that of DRAM. The hybrid memory should guarantee that, most I/O request are served by DRAM, rather than by SSDs. So, this work takes two measures to enhance the hit ratio in DRAM. The first is an adaptive prefetching policy used to prefetch data blocks from SSDs to DRAM. The second is a cost-based cache replacement policy used to manage DRAM. The adaptive policy guarantees that, data blocks have already been prepared in DRAM when they are demended. When DRAM is full, the cost-based replacement policy gives higher priority to replace data blocks that are easy to prefetch. Whether data blocks are easy to be prefetched or not is decided by their access pattern. If a file exhibits a strong access pattern, data blocks belonging to it are easy to be prefetched. So, this work proposes a novel pattern recognition scheme to support the above prefetching and replacement policies.

Keywords/Search Tags:

Flash Memory, FTL, SSD, SSD-based Cache, Cache Replacement Policy, Hybrid Memory, Hot Data Identification, Data Prefetching

PDF Full Text Request

Related items

1	Research On Optimization Of The Memory Cache Policy Based On Hadoop In Hybrid Cloud Environment
2	Research On Cache Optimization Mechanism In Heterogeneous Memory Environment
3	A Research On Cache Replacement Mechanism For Hybrid Memory Systems
4	Improving memory hierarchy performance with hardware prefetching and cache replacement
5	Cache memory design with embedded LRU replacement policy
6	Research On Compressed Cache Technology For Performance Optimization
7	The Research On Data Replacement Policy Based On Request Frequency Of NDN Cache
8	Design And Implementation Of Distributed Cache Management System For In-memory Columnar Database
9	Research On Memory Management And Cache Replacement Policies In Spark
10	Design And Implementation Of Web Proxy Server Based On Cache Replacement Technology