Font Size: a A A

Research On LSM-Tree Based Database Performance Optimization For HTAP Workload

Posted on:2021-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:B Q ShuFull Text:PDF
GTID:2518306503474104Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Log-structured merge-tree(LSM-Tree)is a storage structure with extremely low write latency.And it becomes sought-after in modern databases.Modern Internet has witnessed a rise of large-scale information and increasingly complex business needs.Modern business not only requires the database to support online transaction processing(OLTP)but also hopes the same database provide fast processing of online analytical(OLAP).Such a database is regarded as a hybrid transactional/analytical processing(HTAP)database.To support HTAP workloads for the database based on LSM-Tree,two shortcomings need to be addressed.The first is availability.LSM-Tree stores fresh data in memory,then persist them to disk when memory usage reaches the threshold.When the database is facing a large amount of data to be written but the memory is insufficient,it has to block the write operation and let service unavailable.The second is read amplification because LSM-Tree maintains a leveled structure.It decreases the performance of OLAP.RocksDB is the most popular and mature key-value store based on LSM-Tree,and it is widely used as a storage engine for various data systems.So this paper takes RocksDB as an example to solve the problems of LSM-Tree to support HTAP workloads efficiently.To ensure the database can save a large number of writes with high availability,this paper inserts a data access layer for caching,collection,and transferring between the storage layer end the data generation layer.It is a distributed cache service between RocksDB and the data sources based on the Apache Flume framework.And this paper proposes a load balancing strategy according to free memory,resulting in an effective traffic allocation.Finally,it eases the pressures of peak traffic and improves service availability.To reduce the read amplification,this paper improves the OLAP performance by row-column mixed storage.Column-oriented storage is friendly for high concurrency,memory,compression,and sequential read.RocksDB only supports row-oriented storage,which limits the performance of OLAP.So this paper implements a column-oriented file structure and proposes an online hybrid storage strategy based on historical operations.The strategy monitors each valid file,calculates the gains and losses of format conversion,and trigger conversions for those files whose gains are greater than the conversion losses.Besides,the strategy is embedded in RocksDB's background task to determine the file format before creating.In the end,data that meet mostly OLTP remain the original row-oriented format and data that are frequently accessed by OLAP are converted to column-oriented format,which reduces the amount of disk access.Finally,the experimental results meet expectations.The data access layer improves the availability from 68.09% to 99.99% and works with nice scalability.Also,the performance of sequential read on column-oriented store achieves 5 times that of the row-oriented.Hybrid storage fastens 84%of queries of TPC-DS benchmark and the improvements ranging from 5%to 60%.Besides,HTAP workloads achieve up to double the performance.
Keywords/Search Tags:High Availability, HTAP, LSM-Tree, row-column mixed storage, RocksDB, Apache Flume
PDF Full Text Request
Related items