Font Size: a A A

HTAP-oriented Large-scale Distributed Database Hybrid Storage Engine

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:R R YaoFull Text:PDF
GTID:2428330620464192Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,businesses are becoming more diversified,and the amount of application data is increasing,Traditional Online Analytical Processing(OLAP)and traditional Online Transaction Processing(OLTP)database systems are independent of each other due to differences in architecture and different ways of organizing data storage,and cannot simultaneously meet increasingly complex business requirements.As a result,databases for hybrid workloads(Hybrid Transaction / Analytical Processing,HTAP)have emerged as the times require,becoming one of the main development directions of databases.Traditional OLAP or OLTP-oriented databases use a single "column store" or "row store" storage solution,which cannot take advantage of the two data storage formats.).Because HTAP must do both online transaction processing and online analysis processing,how to design a data storage strategy for HTAP has become one of the hot research topics.Aiming at the application scenario of HTAP,this thesis proposes a distributed HTAP database system framework,storage engine data organization format,and the use of machine learning to optimize the data storage layout in the storage engine in order to more friendly support HTAP's future workloads,thereby improving database performance.The thesis mainly completed three aspects of work: 1)Researched the industry's popular distributed HTAP database storage architecture and storage engine data organization,analyzed its main advantages and disadvantages,and designed and implemented a distributed HTAP database framework and flexibility Data storage format.2)Designed and implemented a storage layout reorganization algorithm.The densitybased DenStream algorithm combined with the greedy algorithm was used to periodically calculate the optimal storage layout that is friendly to future HTAP workloads.)Designed and implemented the online reorganization technology,using the incremental method to copy the data of the old layout in the storage engine to the new layout with TileGroup as the reorganization granularity,and gradually transformed the layout into an OLAPfriendly storage layout.Based on the Goldfish query engine of the self-developed distributed columnar database,this thesis designs and implements a distributed database system for HTAP.Aiming at the data organization method of the storage engine in this system,compared with row storage and column storage respectively,from the test data,the data organization method designed in this article is significantly more efficient when compared to the row storage engine when it is oriented to online analysis and processing,and its performance is comparable to that of column storage.The performance when it is oriented to online transaction processing is significantly better than the column storage engine.Well,performance can be compared to row storage.In HTAP-oriented application scenarios,the data storage structure designed in this article takes into account the advantages of row storage and column storage,and its performance is significantly better than that of rowonly or column-only data organization.It is more friendly to HTAP workloads and greatly improves The performance of HTAP database.
Keywords/Search Tags:mixed load, storage format, layout reorganization algorithm, online reorganization technology, optimal storage layout
PDF Full Text Request
Related items