With the rapid development and widespread application of information technology,the scale of data around the world is growing at an astonishing rate.The characteristics of massive,unstructured and small-sized data make the efficient data storage and management of storage systems a challenge.Log-Structured Merge tree(LSM-tree)based key-value(KV)stores are widely used as the underlying storage engine for its good write performance.However,due to the mismatch between data management granularity and application hot data granularity,stalling of user writes induced by flush operation blocking,and unbalanced read and write performance for small data workloads,LSM-tree based KV stores face great challenges in high performance and scalability.To address the above problems,this thesis rethinks the design of LSM-tree structure,and conducts in-depth research on different aspects in terms of data organization,read/write critical paths and merge strategies.Combining the workload characteristics of typical applications,this thesis makes the following innovative researches:The rapid growth of data volume causes some of the metadata in LSM-tree based KV stores to be stored on disk,making the disk I/O for loading metadata severely impact read performance.To address this problem,considering the skewed distribution of read requests in modern applications,a segmented metadata management method is proposed to achieve fine-grained metadata management and caching,which alleviates the memory space waste and excess disk I/O overhead caused by the mismatch between data management granularity and hot data granularity,thus improving read performance.The segmented metadata management approach implements segmented metadata management by reconstructing the metadata region of the data file,slices index data into multiple segments during data file creation and builds a segment filter for all KV pairs managed by each segment to avoid the overhead of querying indexes to locate filter positions.In addition,an efficient segmented metadata caching strategy is designed to decouple the cache management of file operation pointers and metadata,using two caches to manage file operation pointers and segmented metadata separately.The segmented metadata management approach improves read performance by reducing the overhead of loading metadata and caching more segmented metadata with the same memory cost.Experimental results show that compared with Level DB,the heuristic segmented metadata management approach reduces read latency by89.4% when the metadata cache size is 25% of the entire metadata volume,without affecting write performance.In the LSM-tree,frequent merge operations can cause flush operation to be blocked,which in turn blocks the foreground write requests and affects write performance.To address this problem,a key prefix-based data partition management method is proposed to expand the range of sequentially written data in the LSM-tree and reduce the probability of overlapping key ranges between data files,thus reducing the number of merge operations and mitigating the impact of background operations on foreground write requests.The key prefix-based data partition management method splits the data written to the first level in the disk component of LSM-tree into multiple data partitions based on the prefix of keys during the flush process,and manages data in partitions at the first level without key range overlap between partitions.The merge operations at the first level are performed using partition-based polling,so that data is written sequentially to the lower level.When the amount of data written sequentially is larger than the capacity of adjacent levels,the number of merge operations can be greatly reduced,thus mitigating the interference of merge operations with flush operations.The experimental results show that compared with Hyper Level DB,the key prefix-based data partition management method reduces the number of background merge operations by 80% and improves the write performance by15.8%~52%.The existing LSM-tree design does not balance read and write performance well,making it difficult for LSM-tree based KV stores to provide high performance and scalability when facing small KV pair workloads with a large number of read and write requests.To address this problem,a read-write balanced data merge strategy is proposed,including a hybrid tiering merge mechanism and a concurrency mechanism for background operations.On the premise of low write amplification,taking a step further on reducing read amplification and mitigating the impact of background operations on foreground I/Os.For the merge operations between adjacent levels in the LSM-tree,the hybrid tiering merge mechanism uses a combination of data sorting and data moving is used to divide the SSTables involved in a merge operation into a sorting subset and a moving subset,where the data in the sorting subset needs to be reordered and written to the next level,while the data in the moving subset only requires changing the state in the system version without data rewriting.For the largest level in the LSM-tree,the hybrid tiering merge mechanism uses a read-aware intra-level merge policy to perform merge operations on data within the frequently accessed key ranges to eliminate data redundancy in the hot key ranges and reduce read amplification.The concurrency mechanism for background operations is designed to reduce the impact of background operations on foreground I/Os by concurrently executing flush,the first level merge and other level merge operations through three dedicated background threads.Experimental results show that comapred with Pebbles DB,the read-write balanced data merge strategy achieves better read-write performance balance,and the average latency of read-write requests is reduced by 10%~54% under YCSB benchmarks. |