Optimization Of Database Management System Based On Near Data Processing

Posted on:2021-02-17

Degree:Master

Type:Thesis

Country:China

Candidate:Z Xiong

Full Text:PDF

GTID:2428330611962400

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Faced with the rapid growth of data,the data center urgently needs to improve the data processing capacity.The transmission of massive data from storage device to host is one of the bottlenecks in large-scale data processing which has received great attention in recent years.Near Data Processing technology proposes to push down the data calculation to a position closer to the data storage node.As the most frequent application of data processing and storage,DBMS is a good carrier of near data processing research.In order to deal with the challenge,the research on database optimization based on the idea of Near Data Processing and the characteristics of new storage devices.With certain computing power,high bandwidth and low latency of the new storage devices,the research has attracted extensive attention.In this paper,Remote procedure call(RPC)is designed to realize the separation of storage engine in database system.Then,a database optimization method based on near data processing framework is proposed to solve the problem of computing resource loss in data transmission belt from the aspects of performance,power consumption and reliability.The specific content of this paper is as follow:(1)Near data processing framework is the key to realize the transfer of data intensive computing to storage nodes and reduce the system loss caused by frequent data transmission.To ensure that the database can support the mode of near data processing,it is necessary to select a reasonable communication protocol so that the computing tasks between the host and the memory can run in cooperation.Therefore,a database cooperative processing method based on RPC communication is proposed.By separating the storage engine of the database,it can run independently in the storage device system,so as to support the downward movement of data intensive computing.The theoretical analysis and experiments show that,the near data processing can greatly reduce the loss of data transmission when the data transmission reaches a certain scale.(2)For improving the database processing performance under the near-data processing framework,this paper proposes a data-intensive operator optimization method.The basic idea is to separate the relevant operators of data-intensive computation and then selectively move them down.The selection and separation of operators reduce the reverse transmission of data and the computational pressure of memory.On the basis,a near-data optimization method of database operator separation is proposed based on the separation mode of storage engine.Through the experimental analysis,it is proved that this scheme can reduce the unnecessary computation downshift,reduce the data transmission caused by operator call,and improve the computational performance of the database system.(3)When moving down the data-intensive operator,the traditional query executor only move down the correlation operator in a way of cost or empirical rules.And the system cannot judge whether to adopt the near-data processing mode,nor objectively evaluate when the operator is suitable for moving down.In view of this,a query optimization scheme for near data processing based on sampling cost estimation is proposed.Before the query engine generates the query plan,the host obtains the filtering effect of the query operation by sampling,compares the query plan with the data intensive operator moving down with cost estimation,and selectively executes the near data processing scheme.The experimental results show that this scheme can effectively select the appropriate query plan and improve the efficiency of the database system.

Keywords/Search Tags:

near data processing, database, RPC, storage engine, query optimization

PDF Full Text Request

Related items

1	Research On Key Technologies Of Distributed Rank-aware Query Processing
2	Research On SPARQL Query Engine Across Different Storage Platform
3	Design And Implementation Of Acceleration Method For Massive Distributed In-Memory Database Query Engine
4	Research On Data Query Processing And Optimization In Distributed Database
5	A Hybrid Storage Engine Based On The Architecture Of Read/Write Separation
6	Research On Techniques And Systems For Index And Query Optimization Of Big Data
7	The Design And Optimization Of XML Data Query Method Based On The SF&B Compressing Storage Structure
8	Parallel Query Processing Techniques In Parallel Database System PBASE/2
9	Optimization Of Query Algorithm For Distributed Relational Database
10	Query Processing And Optimization In Massive Multi-Database Integration