Font Size: a A A

Compilation Execution Framework Of Massive Distributed Memory Columnar Database

Posted on:2021-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:J FengFull Text:PDF
GTID:2428330623468544Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of data scale,humans have created distributed computing systems to implement structured data processing,that is,online analysis and processing.Existing distributed computing systems,such as distributed memory databases,generally use batch processing to process large amounts of data,while using a storage format that is more suitable for batch processing to store data,which is columnar storage.At the same time,the in-memory database uses larger memory space to make up for the disk "I/O wall" problem of traditional databases.In order to improve computing ability futherly,the "memory wall" problem of batch processing should to be solved.The useage of compilation technology to generate executable code dynamically for computing tasks can furtherly improve the overall computing ablility of every servers and systems in a distributed memory database,and puts forward new requirements for plan generation and execution at the same time.On one hand,this method generates streamlined and efficient executable code to improve code locality by using run-time information.On the other hand,it can integrate multiple computing tasks to reduce the reading and writing of memory data and improve data locality.The latter aspect is also adapted to the new storage medium of non-volatile memory.Based on the distributed columnar memory database,this thesis studies the method of improving the computing efficiency through the technology of dynamically generating and executing code,and the corresponding plan generation and execution method.The main tasks as follows:1.Research,design and implement a compilation and execution framework based on large-scale distributed memory column database.The framework converts the execution plan diagram of a computing task into the underlying computing primitives(operators),and can integrate the processing of multiple computing primitives into the executable code of the same function.The framework supports customizing other computing primitives to add specific computing capabilities to the generated code.2.Designed and implemented a database execution framework based on compilation and execution and a set of computing primitives.The execution framework is responsible for task reception and execution scheduling,and management of generated code.After receiving the distributed execution plan,it uses basic computing primitives to compile it into executable code for execution.3.Research the fusion strategy of execution plan nodes in a distributed environment.This strategy integrates related computing functions to improve data locality and thus improve the execution speed of neighboring execution plan nodes.Finally,this thesis tests the function and performance of the system composed of the execution framework and the compilation execution framework.The tests show that the code generated by the compilation and execution technology has a significantly faster execution speed,and the generated code of the fusion execution plan node significantly reduces the execution time by reducing memory data transmission.
Keywords/Search Tags:Distributed System, In-memory Computing, Database, Compilation
PDF Full Text Request
Related items