Design And Implementation Of Acceleration Method For Massive Distributed In-Memory Database Query Engine

Posted on:2019-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:B Y Li

Full Text:PDF

GTID:2348330563953993

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the advent of the information era,it is more and more difficult for a database with disk storage to meet the requirements for rapid response when a large amount of data is queried.In order to further improve the real-time query of massive data,the use of full memory for calculation gradually become a new direction for database development.Because the storage of memory database is completely memory-based,data stored on disk in traditional databases is transformed into addressable data in memory.This change has brought some new problems,such as how to use memory space efficiently,and how to improve the execution of the in-memory database query engine efficiency.This thesis proposes an optimization scheme for the query engine part of the "OLAP-oriented distributed columnar memory database",aiming at the new problems brought about by the change of the storage medium in the database.The purpose is to further reduce memory overhead and query latency when the query engine of a distributed in-memory database processes large amounts of data.The main work of this paper is as follows:1.Design and implement an intermediate data structure that adapts to the distributed columnar memory database query engine,which can reduce the storage cost of data in the memory and improve the execution efficiency of each physical operator in the execution layer,and finally reach mass fast real-time analysis of data.2.Based on the above-mentioned intermediate data structure,design and implement a set of physical operators that can fully improve the efficiency of CPU usage and speed up the calculation speed of the query engine.3.Design multiple implementations for key physical operators.In the query process,according to the database metadata and data distribution histogram,combined with the data fragment storage location,network transmission overhead,storage engine node load and other information,the most appropriate implementation program for the operator is dynamically selected on the distributed cluster to calculate,reduce Query delay.4.In the in-memory database computing process,a set of data distribution strategies is designed and implemented for multi-node data load information to reduce network transmission overhead,thereby further speeding up the query engine's computation speed.Finally,this thesis uses the standard test set TPC-H for OLAP database to perform comprehensive functional and performance tests on the distributed columnar memory database query engine,and compares the query performance with the Spark-SQl database that has established a temporary cache table.test.Performance test results show that the query engine acceleration method designed in this paper makes the query speed of range query is more than 3 times faster than that of Spark-SQL.The performance of grouped aggregate statement is Over 8 times of Spark-SQL,the performance of the sort statement is 2.5 times more than Spark-SQL;the memory overhead is only one-ninth of Spark-SQL.In a comparison test with Spark-SQl,it was found that the system has better performance in terms of query speed and memory usage efficiency.

Keywords/Search Tags:

distributed memory database, database query engine, data distribution strategy, physical operator optimization

PDF Full Text Request

Related items

1	Design And Implementation Of Query Optimization Module For Distributed Column Database Based On Memory
2	The Research And Application Of Query Optimization In Distributed Database System
3	Distributed Joins And Optimization For BIG Table Based On Database OceanBase
4	Research On Query Optimization Method Of Database Based On Cache Strategy
5	Research On Data Query Processing And Optimization In Distributed Database
6	Semijoin Strategy-based Distributed Database Query Optimization Theory And Applications
7	Optimization Of Query Algorithm For Distributed Relational Database
8	The Research Of Memory Database Query Optional In Multi-core System
9	Massive Distributed In-memory Columnar Database Query Engine For On-line Analytical Processing
10	Applied Study On Query Optimization Strategy In A New Generation Of Database