The Design And Implementation Of Performance Optimization On Storage Module Of StellarDbB

Posted on:2021-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:J H Xu

Full Text:PDF

GTID:2428330647950869

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Nowadays,graph databases are becoming increasingly important as basic software due to their modeling advantages in real-life production scenarios and their performance advantages in relationship finding.Many software companys,countrywide or worldwide,are building their own graph database systems.Stellar DB is the graph database of Transwarp Inc.,which can be combined with other components to achieve important functions such as graph storage,graph analysis,graph visualization,etc.Stellar DB storage engine uses Log-Structured Merge-Tree(LSM-tree)data structure as the underlying data storage method.However,during the actual deployment,Stellar DB encountered serious performance issues.Stellar DB performance dropped dramatically after high concurrency writes of large data sets.Although the database node is still active,the client cannot receive a response within the expected time after sending a query request.The original response timeout was 30 seconds,but Stellar DB actually took several minutes to return the query results.The thesis describes the process of analyzing the problem: by looking at the performance data,it can be seen that the data piles up at the top of the LSM-tree and flows downwards very slowly,resulting in the need to access up to thousands of data files when reading requests to search for the primary key of a data entry,which prevents the system from responding in the expected time.This is because in the data flow process,due to the inherent maintenance of the LSM-tree structure itself,there is a lot of re-write of old data,resulting in the waste of disk reading and writing.And in this scenario disk has played the maximum read and write speed,so it need to enhance the utilization efficiency of the disk read and write,and reduce the waste.With this method,the thesis implemented the following optimization.Firstly,the new flush algorithm forces memory buffer to be sliced,making the record key range of data files in L0 become narrow.By optimizing files' data distribution,scheduling compaction tasks became easier.Secondly,the new compaction algorithm can choose files in upper level basing on lower level key range and calculate the compaction schedule plan with least I/O waste.Then the thesis demostrated results of performance test during the optimization process,explaining the evidences on which the optimization was based.Moreover,the thesis implemented an simulator of the storage module for more testing and verification of the influence of the new algorithms.Based on this idea,the thesis achieves an optimization scheme to reduce read and write waste.First,by modifying the flush algorithm to force slicing of the memory buffer and optimize the data distribution in the file,the primary key range of a single L0 data file is narrowed,facilitating compaction scheduling.Secondly,to achieve a more refined compaction algorithm,select as many upper-level data files as possible according to the primary key range of the lower-level data files,and calculate the scheme with the lowest I/O waste.So Stellar DB can reduce the disk I/O waste in the compaction process and accelerate the data flow down the LSM tree.The thesis details the design and implementation of a joint innovative optimization scheme for the flush algorithm and compaction algorithm,and presents the results of several performance tests during the optimization process to explain the basis and effect of the algorithm step-by-step optimization.In addition,the thesis implements a storage module simulation program for retesting and verifying the validity of the optimization algorithm by abstracting and simplifying the storage module of Stellar DB.The final test results showed that the compaction read amplification rate from L0 to L1 dropped from 419% to 119%,the data no longer piled up on the top layer,the compaction process at each layer was smooth,and the data could flow to the bottom layer relatively quickly.After optimization,Stellar DB's performance issues in high concurrent large data set write scenarios have been resolved and the system is able to respond to read requests in a timely and normal manner.The optimized algorithm is currently running in Stellar DB with stable performance.

Keywords/Search Tags:

Graph database, StellarDB, Log-Structured Merge-Tree, Flush, Compaction

PDF Full Text Request

Related items

1	Research And Implementation On Hybrid Compaction Mechanism Based On Log-structured Merge Tree
2	Research And Implementation Of Light-weight Compaction Key-value Storage System
3	Performance Optimization Of Log-structured Merge Tree In Database System
4	Research And Implementation Of A Key-value Storage System Based On NVM
5	Research On SSD-based LSM-Tree Key-Value Storage System
6	Design And Implementation Of Storage System Based On The Log-Structured Merge-Tree
7	Research And Implementation Of A New Hybrid Tree Key-value Store System
8	Graph-based data analysis: Tree-structured covariance estimation, prediction by regularized kernel estimation and aggregate database query processing for probabilistic inference
9	Research On LSM-tree Key-value Store Based On High-density SSD
10	Optimization On Key Value Database With Global Invalid Data Awareness