Font Size: a A A

Research On Optimizing Large-scale Key-value Stores Based On LSM-trees

Posted on:2020-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:F MeiFull Text:PDF
GTID:1368330590958918Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The development of big data demands higher performance on large-scale storage system.Since write optimization structures such as Log-structured Merge-trees(LSM-trees)can achieve high throughput,they have become the backbone in large-scale storage environment.LSM-trees achieve high performance by aggregating small random writes into large sequential writes that can take full advantage of the external storage device's bandwidth.In recent years,computer hardware technologies also evolve rapidly: on the one hand,multi-core processors have become widespread;on the other hand,non-volatile memories supporting concurrent IO,especially flash-based SSDs,have been widely deployed and are gradually replacing the traditional mechanical hard drive.In the face of the explosive growth of data and the characteristics of new hardwares,the LSM-tree-based applications running on the traditional storage architecture encounter great challenges.With regard to the above situation,it is necessary to comprehensively consider and optimize the LSM-trees to meet the requirements of large-scale storage from aspects such as processing logic,storing routine,and storage medium.Large amount of work has been carried out in academia and industry,including customizing LSM-trees for specific workloads,changing LSM-tree compaction strategies to reduce write amplification,designing LSM-trees for flash-based SSDs or other non-volatile memories,etc.This paper focuses on the deployment and research work of LSM-trees from three aspects,as memory buffer structure,IO processing and overall organizational structure,and proposes effective solutions.First,an LSM-tree uses a block of memory to buffer new data,which is typically indexed by a skip list structure.Because the memory buffer structure is in the critical path of read and write request processing,its performance is critical to the development of large-scale storage system.Current skip lists have two types: probabilistic skip list and deterministic skip list.The former has low maintenance cost but resulting in unstable index structure,while the latter is the opposite.In this paper,a new skip list called BodSkip is proposed to create index nodes on demand during the search process,so as to improve the performance of the memory buffer structure.BodSkip determines whether an index node is needed to be created based on the number of traversal steps,rather than relying on random probabilities,hence generating stable indexes.At the same time,the index nodes are generated according to the search history information,thus the maintenance overhead caused by creating unnecessary index nodes is avoided.When adding a new node,BodSkip only involves updating two adjacent nodes before and after the new node,bringing advantages in concurrent operations.Secondly,modern implementations of LSM-trees rely on the traditional storage hierarchy ”application/file system/block device”,and the data managed by an LSM-tree is persisted on the block device through a file system.However,the involvement of the file system layer seriously affects the utilization of the device bandwidth for LSM-trees.This paper deeply analyzes the IO characteristics of three typical file systems in storing the data generated by LSM-trees,from which we find that the reason of low device bandwidth utilization is that file systems generate large number of small IOs,and experiments are conducted to verify and quantify the IO overhead caused by the file system layer.Based on this founding,we propose the LSM-tree direct storage system(LDS).The principle of the direct storage technology is using the LSM-tree to manage the external memory space so as to completely eliminate the IO overhead introduced by the file system layer.Finally,numerous multi-stage structures besides LSM-trees are proposed in recent years,but there lacks a taxonomic system,leading to the deployment and research work inefficient and difficult.Aiming at this problem,this paper comprehensively analyzes the basic characteristics of multi-stage structures,and proposes a ”tree/forest” taxonomy model.Furthermore,we explore design space still existing in multi-stage structures under the guidance of the taxonomy,and propose a high performance key-value store called SifrDB based on the forest model.The ”tree/forest” taxonomy model breaks through the conception limitation of the LSM-tree,and is the first that explains the root cause of performance preferences in multi-stage structures,by which the various implementations based on multistage structures are strictly differentiated based on their basic behaviors.With the taxonomy,users can make best choice for deployments and researchers can have better understanding on multi-stage structures when conducting related research work.The proposed key-value store SifrDB absorbs the advantages of systems implemented on different models,while avoiding their shortcomings.Specifically,SifrDB is able to run on LDS seamlessly so as to benefit from the direct storage technology we proposed.Moreover,in SifrDB an efficient parallel search algorithm is designed to solve the read degradation problem of multi-stage structures,which fills the gap that current multi-stage structures fails to effectively use SSD internal parallelism to improve performance of query requests.In summary,the above research comprehensively examines the overall architecture and data processing process of the traditional LSM-treeds,and analyzes the key performance issues.The proposed optimization methods and implementation solutions maximize the potential of hardware resources by integrating the application logic and hardware features,so as to comprehensively improve the LSMtree performance in large-scale storage environment.
Keywords/Search Tags:Big Data, Key-Value Store, LSM-tree, Parallel Algorithm, File System, Write Amplification, Data Durability
PDF Full Text Request
Related items