Font Size: a A A

Design And Implementation Of Acceleration Method For Massive Distributed In-Memory Database Query Engine

Posted on:2019-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiFull Text:PDF
GTID:2348330563953993Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the information era,it is more and more difficult for a database with disk storage to meet the requirements for rapid response when a large amount of data is queried.In order to further improve the real-time query of massive data,the use of full memory for calculation gradually become a new direction for database development.Because the storage of memory database is completely memory-based,data stored on disk in traditional databases is transformed into addressable data in memory.This change has brought some new problems,such as how to use memory space efficiently,and how to improve the execution of the in-memory database query engine efficiency.This thesis proposes an optimization scheme for the query engine part of the "OLAP-oriented distributed columnar memory database",aiming at the new problems brought about by the change of the storage medium in the database.The purpose is to further reduce memory overhead and query latency when the query engine of a distributed in-memory database processes large amounts of data.The main work of this paper is as follows:1.Design and implement an intermediate data structure that adapts to the distributed columnar memory database query engine,which can reduce the storage cost of data in the memory and improve the execution efficiency of each physical operator in the execution layer,and finally reach mass fast real-time analysis of data.2.Based on the above-mentioned intermediate data structure,design and implement a set of physical operators that can fully improve the efficiency of CPU usage and speed up the calculation speed of the query engine.3.Design multiple implementations for key physical operators.In the query process,according to the database metadata and data distribution histogram,combined with the data fragment storage location,network transmission overhead,storage engine node load and other information,the most appropriate implementation program for the operator is dynamically selected on the distributed cluster to calculate,reduce Query delay.4.In the in-memory database computing process,a set of data distribution strategies is designed and implemented for multi-node data load information to reduce network transmission overhead,thereby further speeding up the query engine's computation speed.Finally,this thesis uses the standard test set TPC-H for OLAP database to perform comprehensive functional and performance tests on the distributed columnar memory database query engine,and compares the query performance with the Spark-SQl database that has established a temporary cache table.test.Performance test results show that the query engine acceleration method designed in this paper makes the query speed of range query is more than 3 times faster than that of Spark-SQL.The performance of grouped aggregate statement is Over 8 times of Spark-SQL,the performance of the sort statement is 2.5 times more than Spark-SQL;the memory overhead is only one-ninth of Spark-SQL.In a comparison test with Spark-SQl,it was found that the system has better performance in terms of query speed and memory usage efficiency.
Keywords/Search Tags:distributed memory database, database query engine, data distribution strategy, physical operator optimization
PDF Full Text Request
Related items