Font Size: a A A

Cost Model And Query Optimization For In-memory Column-stored Database

Posted on:2021-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y T KongFull Text:PDF
GTID:2428330611455049Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous development of computer science and the Internet,the amount of data that needs to be processed is growing,which puts higher and higher requirements on the performance of the database.At the same time,in addition to the use scenarios of the tra-ditional databases,which have high requirements for addition,deletion,modification and query performance at the same time,the usage scenarios that mainly emphasize query and analysis performance are increasing.Column database technology has received increasing attention in the context of these scenarios.Existing research in columnar databases focuses on storage models and query execution.There are few studies on query optimization that are focus on column databases.At present,the main columnar database frameworks,such as C-Store,VectorWise,MonetDB,etc.,do not have a specialized cost model or only a cost model based on row databases.At the same time,in these columnar databases,no query optimization is performed or only the part common to the column database and the row database is optimized.On the other hand,the field of query optimization in row databases is much more muture.Traditional databases such as MySQL and PostgreSQL have relatively complete query optimizers that optimize a query plan with a fixed opti-mization process.At the same time,in the field of massive data analysis,the distributed row databases Greenplum and HAWQ adopt a query optimization framework based on rules,cost models and plan search,and optimize task scheduling for distributed scenarios.In this thesis,a cost model for distributed columnar in-memory database is proposed.The cost model can better estimate the execution cost of the query plan in the columnar database,and reduce the impact on the query optimization due to the inaccuracy of the cost model.Then,based on this cost model,a query optimization framework based on trans-formation rules,plan search and task scheduling is implemented for large-scale columnar in-memory databases.Finally,this thesis will test the performance of the query optimization framework,and compare it with the query model optimizer using a row database cost model and the one using a fixed optimization process.The test results show that the query optimizer designed in this thesis can obtain better optimization results when performing query optimization for the columnar database.
Keywords/Search Tags:Database, Distributed, Columnar Database, Cost Model, Query Optimization
PDF Full Text Request
Related items