Font Size: a A A

The Research Of SSD-based Cache Techniques For Distributed Block Storage System

Posted on:2016-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:S W ZengFull Text:PDF
GTID:2348330479453373Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the driven force from cloud computing and big data research, distributed block storage system(DBSS) is more and more important. The prevalent distributed block storage systems includes Petal, Sheepdog, Parallax, BLAST, DHTbd, etc. A cache system called GSDC(Global Shared Dynamic Cache) is proposed, which is built on SSD to improve the performance of DBSS based on characteristics of distributed environment.GSDC adopts several cache nodes, each of which stores a certain amount of hot data. All of the cache nodes consist a "cache level" that are shared by all data nodes in the original system. In order to manage numerous cache nodes, these nodes are distributed to various virtual cache nodes based on consistent hashing table.When the cache data is distributed in the corresponding cache node, a dynamic algorithm is adopted to manage. The cache data is divided into several different data sets with respect to the corresponding properties based the characteristics of the DBSS. Every dataset is managed by a queue. The algorithm adjusts the available storage of every queue dynamically based on the real-time access of every dataset. In order to obtain better performance, an elimination upper bound and an elimination lower bound are set for every queue, both of bounds are dynamically adjusted based on real-time access. When the cache uses more storage than the threshold, the dataset under lighter workload gets eliminated first. But also, we should consider if the length of the corresponding queue exceeds the elimination upper bound.To accelerate the speed of eliminating data, the dirty data in the cache will be synchronized to data nodes when the system is in idle state. At the same time, we take advantage of the redundant storage in the data node and add log back function to increase the original data reliability of the system.The prototype of the system is based on Sheepdog, an open-sourced DBSS. Based on the tests, this technology offers a performance improvement within the range of 20%-270% for accessing different data sets when compared to the original system without cache.
Keywords/Search Tags:distributed block storage system, cache, shared, dynamic
PDF Full Text Request
Related items