Font Size: a A A

Design And Implementation Of Optimization Method For Distributed Columnar In-Memory Database Storage Engine

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:G R LiuFull Text:PDF
GTID:2518306524489674Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,human beings have stored a large amount of data,and how to efficiently process these data is increasingly causing people to think.The traditional Online Transaction Processing(OLTP)database system cannot effectively meet the needs of analyzing huge amount of data,so Online Analytical Processing(OLAP)database system has received wide attention and become one of the research hotspots.The expansion of memory capacity and price has made it possible to store data directly in memory closer to the CPU,which,together with columnar storage,can greatly improve data processing efficiency.However,there are still some shortcomings in current in-memory database storage,such as: data distribution is scattered and cannot effectively filter irrelevant data,resulting in the need to access multiple data slices;the data is organized in a single way,using the same data encoding method for data with different characteristics.In this thesis,we will address various problems of the distributed columnar inmemory database storage engine and propose optimization solutions in terms of data partitioning,data storage,data transfer and data computation,with the aim of organizing data efficiently and reducing memory usage while improving storage engine access efficiency.The main work is as follows.1.Design and implement a new data partitioning method.Change the characteristics of the original discrete storage of data,propose a data partitioning method suitable for the analysis scenario,and establish multi-level statistical information.The data is divided into several groups by an attribute value,and the groups are sorted by another attribute value,and each group stores and manages data by column,reducing the impact of the original table Row ID,and only the Row ID mapping relationship within the group needs to be maintained,reducing the cost of maintaining Row ID.The data in each column of each group can continue to be divided into smaller storage units,and statistics can be created at each level for selecting data for scanning at different granularity.2.Design and implement multiple storage formats and intermediate data structures.For different forms of data organization,appropriate data storage formats can be selected according to the characteristics of the data itself,taking into account the access speed and space usage efficiency.Based on the storage structure,the intermediate data structure of data transmission between different nodes can be optimized to reduce the network communication overhead and memory consumption.3.Design and implement storage tier operators.Based on the unique data partitioning method,the storage layer operator is selected under certain circumstances so that it can complete part of the computation in the storage engine,reduce a large amount of data transfer,improve the resource utilization of the storage engine,and reduce the query latency.This thesis is based on the self-developed distributed columnar in-memory database Gold Fish,and the optimized system is tested in terms of both functionality and performance.The results show that the memory consumption is significantly reduced and the query performance is improved to different degrees.
Keywords/Search Tags:In-memory Database, Columnar Storage, Memory Space Optimization, Distributed System
PDF Full Text Request
Related items