Font Size: a A A

Optimization On Key Value Database With Global Invalid Data Awareness

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2518306572496954Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Key-Value(KV)storage systems based on Log-Structured Merge-tree(LSM-tree)have been widely used because of its high write performance and scalability.Because LSM-tree Key-Value Store is updated in different places,different versions of key-value pair corresponding to the same key may be stored in different layers of LSM-tree.The LSM-tree Key-Value Store Engine removes invalid data(old version key-value pair data other than snapshot)by Compaction(merge sort).However,each Compaction can only determine and delete the invalid data based on the local data getting involved in Compaction,which may result in the invalid data staying in the system for a long time and getting involved in the following Compactions for many times,which degrades the system's read and write performance and takes up excessive space of the storage.For applications that updates data uniformly,the impact of invalid data on the system cannot be ignored.To solve the problem above,Gida DB(Global Invalid Data Awareness Database),a KV storage system,is designed and implemented.Gida DB detects the old data in SSTables(Sorted String Table)of the LSM-tree(in layers L1?LN)using the newer data in the LSM-tree memory immutable component,and stores them in the old data information table.Gida DB creates the old-data-detection thread to do this,synchronized with the Compaction operation.During Compaction,Gida DB uses the old data information table to detect the old data in the SSTables getting involved in Compaction.Combined with the snapshot information,it identifies and deletes the invalid data in the SSTables,so as to reduce the extra I/O caused by the invalid data getting involved in Compaction for many times.And reduce the extra space used by the invalid data.The test results show that compared with Level DB,the write amplification of Gida DB is reduced by 25.1%,the write throughput is improved by 25.4%,the space consumption is reduced by 34.3%,and the read latency is reduced by 19.6%.On the Level DB basis,Gida DB is simultaneously deployed with the latest write amplification optimization scheme ALDC.Compared with ALDC,the write amplification of Gida-ALDC is reduced by 10.7%,the write throughput is improved by 8.9%,the space consumption is reduced by 22.1%,and the read latency is reduced by 13.0%.It turns out that Gida DB scheme can further improve the spatial utilization and read-write performance of ALDC scheme to some extent.
Keywords/Search Tags:Key-Value Store, Log-Structured Merge-tree, Write Amplification, Invalid Data Awareness
PDF Full Text Request
Related items