Design And Implementation Of Optimization Method For Distributed Columnar In-Memory Database Storage Engine

Posted on:2022-04-30

Degree:Master

Type:Thesis

Country:China

Candidate:G R Liu

Full Text:PDF

GTID:2518306524489674

Subject:Master of Engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,human beings have stored a large amount of data,and how to efficiently process these data is increasingly causing people to think.The traditional Online Transaction Processing(OLTP)database system cannot effectively meet the needs of analyzing huge amount of data,so Online Analytical Processing(OLAP)database system has received wide attention and become one of the research hotspots.The expansion of memory capacity and price has made it possible to store data directly in memory closer to the CPU,which,together with columnar storage,can greatly improve data processing efficiency.However,there are still some shortcomings in current in-memory database storage,such as: data distribution is scattered and cannot effectively filter irrelevant data,resulting in the need to access multiple data slices;the data is organized in a single way,using the same data encoding method for data with different characteristics.In this thesis,we will address various problems of the distributed columnar inmemory database storage engine and propose optimization solutions in terms of data partitioning,data storage,data transfer and data computation,with the aim of organizing data efficiently and reducing memory usage while improving storage engine access efficiency.The main work is as follows.1.Design and implement a new data partitioning method.Change the characteristics of the original discrete storage of data,propose a data partitioning method suitable for the analysis scenario,and establish multi-level statistical information.The data is divided into several groups by an attribute value,and the groups are sorted by another attribute value,and each group stores and manages data by column,reducing the impact of the original table Row ID,and only the Row ID mapping relationship within the group needs to be maintained,reducing the cost of maintaining Row ID.The data in each column of each group can continue to be divided into smaller storage units,and statistics can be created at each level for selecting data for scanning at different granularity.2.Design and implement multiple storage formats and intermediate data structures.For different forms of data organization,appropriate data storage formats can be selected according to the characteristics of the data itself,taking into account the access speed and space usage efficiency.Based on the storage structure,the intermediate data structure of data transmission between different nodes can be optimized to reduce the network communication overhead and memory consumption.3.Design and implement storage tier operators.Based on the unique data partitioning method,the storage layer operator is selected under certain circumstances so that it can complete part of the computation in the storage engine,reduce a large amount of data transfer,improve the resource utilization of the storage engine,and reduce the query latency.This thesis is based on the self-developed distributed columnar in-memory database Gold Fish,and the optimized system is tested in terms of both functionality and performance.The results show that the memory consumption is significantly reduced and the query performance is improved to different degrees.

Keywords/Search Tags:

In-memory Database, Columnar Storage, Memory Space Optimization, Distributed System

PDF Full Text Request

Related items

1	Design And Implementation Of High Reliability Columnar Storage Engine For Distributed Memory Database
2	Design And Implementation Of Transaction System Based On Distributed Columnar In-memory Database
3	Compilation Execution Framework Of Massive Distributed Memory Columnar Database
4	The Research Of Data Storage In Distributed Main Memory Database
5	RDMA-based Distributed Database Memory Storage System
6	Massive Distributed In-memory Columnar Database Query Engine For On-line Analytical Processing
7	Design And Implementation Of Distributed Memory Object System Based On RDMA
8	DNN-Oriented Multi-memory Distributed Parameter Storage And Read-write Optimization Method
9	Design And Implementation Of Query Optimizer For Massive Distributed Columnar Database
10	The Enterprise Data Real-Time Analysis System Based On In-Memory Database