Font Size: a A A

Research On Optimizing Large-Scale Key--Value Stores Based On LSM--trees

Posted on:2020-04-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:F MeiFull Text:PDF
GTID:1368330614456089Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The development of big data demands higher performance on large-scale storage system,in order to efficiently process massive data traffic.Since write optimization structures such as Log-structured Merge-trees(LSM-trees)can achieve high throughput,they have become the backbone in large-scale storage environment.LSM-trees achieve high performance by aggregating small random writes into large sequential writes that can take full advantage of the external storage device's bandwidth.In recent years,computer hardware technologies also evolve rapidly: on the one hand,multi-core pro-cessors have become widespread? on the other hand,non-volatile memories supporting concurrent IO,especially flash-based SSDs,have been widely deployed and are gradually replacing the traditional mechanical hard drive.In the face of the explosive growth of data and the characteristics of new hard-wares,the LSM-tree-based applications running on the traditional storage architecture encounter great challenges.With regard to the above situation,it is necessary to comprehensively consider and optimize the LSM-trees to meet the requirements of large-scale storage from aspects such as processing logic,stor-ing routine,and storage medium.Large amount of work has been carried out in academia and industry,including customizing LSM-trees for specific workloads,changing LSM-tree compaction strategies to reduce write amplification,designing LSM-trees for flash-based SSDs or other non-volatile memo-ries,etc.This paper focuses on the deployment and research work of LSM-trees from three aspects,as memory buffer structure,IO processing and overall organizational structure,and proposes effective solutions.First,an LSM-tree uses a block of memory to buffer new data,which is typically indexed by a skip list structure.Because the memory buffer structure is in the critical path of read and write request processing,its performance is critical to the development of large-scale storage system.Current skip lists have two types: probabilistic skip list and deterministic skip list.The former has low maintenance cost but resulting in unstable index structure,while the latter is the opposite.In this paper,a new skip list called Bod Skip is proposed to create index nodes on demand during the search process,so as to improve the performance of the memory buffer structure.Bod Skip determines whether an index node is needed to be created based on the number of traversal steps,rather than relying on random probabilities,hence generating stable indexes.At the same time,the index nodes are generated according to the searchhistory information,thus the maintenance overhead caused by creating unnecessary index nodes is avoided.When adding a new node,Bod Skip only involves updating two adjacent nodes before and after the new node,bringing advantages in concurrent operations.Secondly,modern implementations of LSM-trees rely on the traditional storage hierarchy ”appli-cation/file system/block device”,and the data managed by an LSM-tree is persisted on the block device through a file system.However,the involvement of the file system layer seriously affects the utiliza-tion of the device bandwidth for LSM-trees.This paper deeply analyzes the IO characteristics of three typical file systems in storing the data generated by LSM-trees,from which we find that the reason of low device bandwidth utilization is that file systems generate large number of small IOs,and exper-iments are conducted to verify and quantify the IO overhead caused by the file system layer.Based on this founding,we propose the LSM-tree direct storage system(LDS).The principle of the direct storage technology is using the LSM-tree to manage the external memory space so as to completely eliminate the IO overhead introduced by the file system layer.Finally,numerous multi-stage structures besides LSM-trees are proposed in recent years,but there lacks a taxonomic system,leading to the deployment and research work inefficient and difficult.Aim-ing at this problem,this paper comprehensively analyzes the basic characteristics of multi-stage struc-tures,and proposes a ”tree/forest” taxonomy model.Furthermore,we explore design space still ex-isting in multi-stage structures under the guidance of the taxonomy,and propose a high performance key-value store called Sifr DB based on the forest model.The ”tree/forest” taxonomy model breaks through the conception limitation of the LSM-tree,and is the first that explains the root cause of per-formance preferences in multi-stage structures,by which the various implementations based on multi-stage structures are strictly differentiated based on their basic behaviors.With the taxonomy,users can make best choice for deployments and researchers can have better understanding on multi-stage structures when conducting related research work.The proposed key-value store Sifr DB absorbs the advantages of systems implemented on different models,while avoiding their shortcomings.Specif-ically,Sifr DB is able to run on LDS seamlessly so as to benefit from the direct storage technology we proposed.Moreover,in Sifr DB an efficient parallel search algorithm is designed to solve the read degradation problem that the forest model suffers particularly more.The algorithm fully exploits the SSD internal parallelism to improve performance of query requests.In summary,the above research comprehensively examines the overall architecture and data pro-cessing process of the traditional LSM-treeds,and analyzes the key performance issues.The proposedoptimization methods and implementation solutions maximize the potential of hardware resources by integrating the application logic and hardware features,so as to comprehensively improve the LSM-tree performance in large-scale storage environment.
Keywords/Search Tags:Big Data, Key--Value Store, LSM-tree, Parallel Algorithm, File System, Write Amplification, Data Durability
PDF Full Text Request
Related items