Font Size: a A A

Read And Write Performance Optimization For LSM-tree Based Key-Value Stores

Posted on:2022-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:H C WangFull Text:PDF
GTID:2518306725981399Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Key-value stores are increasingly applied to today's social network,e-commerce and other fields to provide high-performance storage and access services for massive unstructured data.LSM-tree(Log-Structured Merge tree),as the most popular underlying data structure for persistent key-value stores,utilizes the high sequential IO bandwidth of disks and exhibits extremely high write performance and acceptable read performance.As the core operation of LSM-tree,compaction maintains the hierarchical structure of the system and regularly reclaims the disk space occupied by old data.However,it frequently rewrites the files on the disk,results in excessive write amplification,restricts the improvement of write throughput,and triggers a large number of erasure operations,which reduce the service life of SSD.Its inefficient sorting scheme blocks the flush operations of memory data and results in write stalls.On the other hand,under mixed read-write workloads,compactions,which are frequently triggered by write operations,constantly reorganize the disk files,result in block cache misses and reduce the read performance.In response to the above problems,this thesis proposes a file-based segmented compaction algorithm,and proposes a series of optimizations to reduce the impact of compactions on read performance.The specific contributions are as follows:1.To solve the problem of excessive write amplification caused by compactions,this thesis proposes a segmented compaction algorithm,which merges multiple lowdensity segments into one high-density segment,greatly improves the efficiency of data inflow to the lower level,and cuts the compaction tasks according to the data range of the lower level files to reduce the number of files that participate in sorting.The algorithm is executed at file granularity,which avoids large performance jitter and extremely high temporary disk space occupation caused by large compaction tasks,and decouples the number of segments from the growth factor,provides a broader adjustment space for the system.Different levels are treated differently,and a parallel read algorithm is used to reduce the query overhead of multiple segments.This thesis models the write amplification of optimized LSM-tree,which considers that duplicate data is deleted during the MemTable insertion process and compaction sorting process,and the write amplification reduction caused by the skew of data distribution in the hierarchy,so as to accurately describe the system write overhead,and solve the optimal parameters of the system to maximize the write performance.2.Aiming at the block cache invalidation problem caused by compactions,this thesis proposes a number of cache optimization algorithms.It includes reusing the data blocks of compaction input files,making corresponding block cache items serve the read operations of input files and output files at the same time;actively identifying and caching hot data blocks of output files;adopting file-based multi-version control and read redirection to realize fine-grained snapshot reads,which reduces cache contention,preheats output file cache and reduces the number of segment queries;using existing data block cache to establish a lightweight key-value pair cache,which reduces disk IO and the query overhead of indexs and Bloom filters.This thesis builds a new LSM-tree based key-value store SLDB on the basis of LevelDB,which applies the above optimization methods.Experimental results show that SLDB can improve the write throughput of LSM-tree by more than two times,greatly reduce the write stalls,and effectively improve the cache hit rate.
Keywords/Search Tags:LSM-tree, Key-Value Store, Compaction, Write Amplification, Parameter Optimization, Cache Optimization
PDF Full Text Request
Related items