Font Size: a A A

Research On Key Technologies Of SSD-based Multi-level Storage Architecture

Posted on:2014-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G ChenFull Text:PDF
GTID:1108330479979590Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The storage subsystem becomes increasingly important within the entire computer system. Hard disk-based storage systems cannot meet the demand of users in terms of performance, energy-efficiency and so on. In recent years, flash memory is advancing rapidly. As its density is increasing and the price is decreasing, flash-based SSDs(Solid State Drive) have been widely deployed in large-scale storage systems. However, the price of SSDs remains higher than that of hard disks, while the price of hard disks will be decreased further. SSDs are unlikely to absolutely replace hard disks in the near future. Instead, industrial communities usually take SSDs as a new hierarchy to incorporate them into existing storage systems. This thesis focuses on the new storage hierarchy and makes contributions in five aspects.(1) The Page to Block Mapping FTL with Low Response TimeFlash memory cannot be directly incorporated into existing storage systems due to its peculiarities in terms of out-of-place update, limited lifespan and so on. The commonplace solution is to encapsulate flash chips into an SSD via flash translation layer(FTL). FTL consists of address mapping, garbage collection, wear-leveling etc. Garbage collection usually imposes serious negative impact on performance. Existing FTLs reclaim a block via two operations, migration and erasure. The migration operation moves some pages that are still valid within the block. Then, the block is erased by erasure operation. These two operations are likely to congest user’s I/O requests for a long time. The work proposes a page-to-block mapping FTL(PBFTL). PBFTL employs either migration operation or erasure operation, but rarely requires both two operations to reclaim a block. As a result, garbage collection is unlikely to congest user’s requests seriously. Experimental results demonstrate that, PBFTL outperforms existing FTLs by more than 15% in terms of response time.(2) An SSD-aware cache replacement policySSDs have been widely deployed in existing storage systems, while the software components remain HDD-oriented, but are rarely optimized for SSDs. This work rethinks the cache replacement policy when underlying storage systems are SSDs and proposes a policy that is aware of the parallelisms existing in SSDs. The principle of the SSD-aware cache(SAC) is explained as follows. An SSD contains multiple channels, and the workloads among channels are usually unbalanced. I/O requests issued to the busy channels are likely to endure a long latency, while I/O requests issued to idle channels will be returned rapidly. SAC gives higher priority to replace pages from idle channels, because these pages can be rapidly retrieved once demanded. On the contrary, pages from busy channels are protected from being evicting out of cache. By this way, the average response time of all requests can be reduced. However, SSDs are black boxes. The cache is unaware of the inner configuration of SSDs, and cannot identify the pages from idle channels. So, another contribution of this work is a novel mechanism used to select pages from idles even though the inner configuration of SSDs is not revealed.(3) A hot page identification scheme used to extend the lifespan of SSD-based cacheThe lifespan of SSDs is limited. If all pages demanded by applications are kept in SSD-based cache, the limited lifespan will be deleted rapidly. But actually, a large amount of cold pages will not be re-accessed even though they are accepted by cache. This work prevents cold pages from entering into SSD-based cache depending on a new hot page identification scheme. As a result, the lifespan of SSDs is significantly enhanced. The fundament of the new hot page identification is a data structure called UCBF(Ultra Counting Bloom Filter). UCBF is able to compute the hotness for each page at the cost of limited memory overhead. The hot page identification scheme maintains a watermark. The page whose hotness is greater than the watermark is allowed to enter into SSD-based cache. The watermark can adapt to the workloads dynamically and continually. Experimental results demonstrate that, the hot page identification scheme helps to extend the lifespan of SSDs by 6 times, and the hit ratio achieved by cache is enhanced by more than 10%.(4) A memory-efficient data structure used to implement replacement policies for large cachesAn important feature of SSD-based cache is the large capacity. The replacement policy for so large a cache usually introduces excessive memory overhead. This work develops a new data structure used to implement LRU policy with limited memory overhead. As the LRU policy is the fundament of most complex cache replacement policies, all these policies can be implemented with limited memory overhead via the new data structure. The proposed data structure implements LRU policy via a FIFO queue and a bloom filter. The FIFO queue is kept in SSD thus introduces no memory overhead. The bloom filter is kept in memory, but is known to be memory-efficient. As a result, the combination of FIFO queue and bloom filter reduces the memory overhead of cache replacement polcy by about 10 times. The bloom filter required above should support element deletion, but existing bloom filters cannot support element deletion and remain memory-efficient. So, this work proposes a memory-efficient bloom filter that supports element deletion to serve for the new data structure used to implement cache replacement policies.(5) The data prefetching and cache replacement policies used to extend main memory by SSDsThe large-scale data processing usually requires a large capacity of main memory, while DRAM is not competent for building the large memory in terms of density, price and energy-efficiency. As SSDs are comparable with DRAM in terms of bandwidth and throughput, this work proposes to extend main memory by SSDs. However, the latency of SSDs is much higher than that of DRAM. The hybrid memory should guarantee that, most I/O request are served by DRAM, rather than by SSDs. So, this work takes two measures to enhance the hit ratio in DRAM. The first is an adaptive prefetching policy used to prefetch data blocks from SSDs to DRAM. The second is a cost-based cache replacement policy used to manage DRAM. The adaptive policy guarantees that, data blocks have already been prepared in DRAM when they are demended. When DRAM is full, the cost-based replacement policy gives higher priority to replace data blocks that are easy to prefetch. Whether data blocks are easy to be prefetched or not is decided by their access pattern. If a file exhibits a strong access pattern, data blocks belonging to it are easy to be prefetched. So, this work proposes a novel pattern recognition scheme to support the above prefetching and replacement policies.
Keywords/Search Tags:Flash Memory, FTL, SSD, SSD-based Cache, Cache Replacement Policy, Hybrid Memory, Hot Data Identification, Data Prefetching
PDF Full Text Request
Related items