Optimization And Implementation For DWMS Column-Store Query Execution Engine

Posted on:2013-01-12

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zhang

Full Text:PDF

GTID:2218330371955887

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The efficiency of query execution is the main concern when developing a data warehouse man-agement system. Recent studies show that column-store system can adapt to the query oriented cha-racteristics of data warehouse better than row-store, because column-store system will only read query related data, avoiding unnecessary I/O cost. What's more, column-store system has higher compress rate than row-store system.We researched the techniques of data warehouse, column-store, and the core modules of query execution engine, and then optimized the structures and strategies of current query execution engine based on the feature of column-store. The structures include the structure of transfer-block and opera-tor nodes. And the strategies include the reusability estimation model base on the relative position of the given operator node in the physical execution tree and reuse buffer scheduling algorithm based on the results of the reusability estimation model.Firstly, we analyze the characteristic of data warehouse and column-store, especially the ways of storage and query execution of MonetDB, C-Store and InfoBright as well as the disadvantages of them.Secondly, we introduce the core elements of column-store query execution engine in detail, in-cluding the physical execution tree, the operator nodes which form a physical tree, the way how data are organized in memory, iterator and its net by which the data can been processed and transferred.Then, we optimize the current query execution engine from the structure aspect and the strategy aspect. Based on the optimized design, we implement a more advanced query execution engine. From the aspect of structure, we design a new transfer-block which can store rowid as well as value of vari-ous types. In addition, the new-designed transfer-block can accomplish low cost tuples reconstruction based the relationship among positions. Then considering that the data of data warehouse is massive and sorted, we optimize several operator nodes, including using factory mode to develop operator nodes to reduce the cost of repeated judgments, pre-sentencing predicate selection node to reduce un-necessary judgments, redesigning hash join node to be a fact table aware one. improving traditional sort join node to make it take the advantage of the feature of join between fact table and dimension table, designing the method for extracting both fixed length and variable-length columns as well as the print node that can be applied to 19 different scenarios. Throughout the whole development process, we keep putting the loops into the functions rather than putting the functions into the loops to reduce the cost of function call. And from the aspect of strategy, firstly, we propose a reusability estimation model base on the relative position of the given operator node in the physical execution tree as well as the estimated volume of the intermediates it produces during execution. Then, we provide the reuse buffer scheduling algorithm based on the results of the reusability estimation model and optimize it as well.At last, we summarize the present situation of the optimization of data warehouse management system column-store query execution engine and preview the research future.

Keywords/Search Tags:

data warehouse, column-store, query execution, optimization

PDF Full Text Request

Related items

1	Research And Implementation Of Key Techniques For Query Rewriting In Column-Store Data Warehouse
2	The Optimization Of The Query Execution Engine In Column Oriented DWMS
3	Research On Query Optimization In Column-Oriented Data Warehouse
4	Research And Implementation Of Query Optimizing Of Column Store In Data Warehouse Management System
5	Research And Implementation Of Query Execution In Column-Stored Data Warehouse Management System
6	Research And Optimization Of Multidimensional Data Warehouse Model Based On Column Storage
7	Research On Optimization Of Big Data Storage Structure And Query
8	Research On Database Optimization And Realization Based On Simulative Column-store
9	Multi-Query Optimization Strategy Design And Implementation In Column-based OLAP System
10	Research On Query Optimization Of Data Warehouse Based On Improved Ant Colony Algorithm