Font Size: a A A

Optimization For LSM-tree Based KV Stores

Posted on:2020-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:W T ZhangFull Text:PDF
GTID:1368330575466585Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing proportion of unstructured data and the growing number of data-intensive application scenarios,more and more companies are adopting KV stores to support the data services of their upper-layer applications.However,as the amount of data and user traffic continue to grow,the KV store is also facing performance challenges.This thesis mainly studies the optimization of read and write performance in the KV store system based on LSM-tree.Specifically,it reduces the write amplification of the system to improve the write performance,and reduces the overall false positive rate of the Bloom filter to improve the read performance.Besides,it also performs related optimization for frequently update scenarios.The main research contents and contributions of this thesis are as follows:1)Optimize write performance of LSM-tree based KV storeWith the increasing amount of data and write-intensive applications,there are more and more enterprises choosing LSM-tree based KV store as the underlying storage en-gine.Despite the good write performance of LSM-tree,in the face of increasing heavy user write requests,the system still needs to continuously improve its write performance.LSM-tree needs to sort the data in the external storage regularly,while this operation causes serious write amplification problem.On the one hand,the write amplification problem will cause the overall throughput of the system to decrease;On the other hand,if SSD is used for external storage,severe write amplification will greatly shorten its lifetime,which will affect the reliability of the system and bring high storage overhead.Therefore,reducing write amplification can save more I/O to serve user requests,and it may also reduce data storage costs.To relieve this issue,this thesis propose a new data structure GLS,and on this basis,a new data maintenance mode is established.Through theoretical analysis,GLS can effectively reduce write amplification compared to LSM-tree.Subsequently,this thesis implemented a KV store called FlameDB based on GLS.Through experiments,FlameDB effectively reduced the write amplification of the original system and improved the write performance effectively.2)Bloom filter optimization for KV storeWith the dramatic increase in user traffic,the KV store will face new challenges in terms of read performance.In order to improve the read performance of the system,many KV stores are equipped with Bloom filter to avoid unnecessary external storage access.However,these traditional Bloom filter allocation strategies do not fully con-sider the access characteristics of the data's placement,so they are not efficient.To this end,this thesis designed a novel Bloom filter allocation strategy for data access features.This thesis first analyzes the access characteristics of the data stored by LSM-tree based KV store under uniform query.A layered heterogeneous Bloom filter allocation strat-egy is proposed for this scenario.Through modeling analysis and experimental testing,compared with the traditional Bloom filter allocation strategy,the hierarchical hetero-geneous Bloom filter allocation strategy can effectively reduce the number of external accesses during data query,and improve the read performance of the system.Then for the arbitrary distributed query requests,this thesis proposes a general Bloom filter allo-cation strategy.Through experimental tests,the general Bloom filter allocation strategy has better read performance than the traditional Bloom filter allocation strategy.3)Performance optimization for KV stores with frequently update scenariosThe current KV store has become a fundamental part of the storage system,and played an important role in dealing with data-intensive services.In order to achieve higher write performance,many KV stores are usually based on data structures such as LSM-tree?However,in the scenario where data is frequently updated,KV stores based on LSM-tree will face serious write amplification problems.In addition,since the up-dated data will be directly written to the external storage,when the old and new versions of the data are not merged in time,it will bring space amplification.The space amplifi-cation problem will make the old data and its corresponding index data occupy a large amount of external storage and memory space.This in turn affects the performance of the system.To this end,this thesis designed the DedupKV prototype system for frequently update scenarios.DedupKV includes two modules:Hot Cache and Dedup-Compaction.Hot Cache can absorb some update operations in memory,and reduce write amplification by avoiding writing this portion of the update operation to exter-nal memory.The DedupCompaction module reduces the space amplification by timely recycling the old version of the data.
Keywords/Search Tags:KV store, LSM-tree, Write amplification, Bloom filter, Space ampli-fication
PDF Full Text Request
Related items