Font Size: a A A

Research And Implementation Of Query Execution In Column-Stored Data Warehouse Management System

Posted on:2012-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:W B YuFull Text:PDF
GTID:2178330332485820Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data warehouse, as an important research direction in information technology area, for which more and more attention has been paid. The research and construction of the early data warehouse were on basis of some relational database management systems. But with the expansion of data scale and the growing demands on query performances, data warehouse based on relational database can't meet the current requirement any more. So implementing a stand-alone data warehouse management system (DWMS) become a strong demand for technology research and application of data warehouse.In the data warehouse environment, compared to the row storage, the column store has a very distinct advantage. It can reduce the reading of non-related columns and be helpful for data compression and compressed query execution to improve query performance. DWMS based on column storage will bring a new change to data storage and query. But query execution engine based on row storage can not adapt to the column storage model. Therefore, research on query execution engine based on column storage is necessary, and is better able to tap out the performance of column storage. This paper, combing with the research and development in an actual project, studies the technologies of query execution with column-stored system, and corresponding optimization and improvement have been made on some points. Materialization method and Materialization Strategy have directly impact on the efficiency of query execution. Focusing on the defects of traditional rowid-based reconstruction method with high cost, this paper proposes a new tuple reconstruction method: address mapping index (AMI). This method stores query relevant columns in the leaf blocks of the data index, and establishes the address mapping indices for non-search key columns of the data index. The address set satisfied with the given query conditions can be obtained by combining the address ranges returned by the data index and the address mapping indices during the query execution. The executor gets the corresponding data blocks from the data index according to the address set, which avoids the reconstruction cost and improves the efficiency of multi-column queries. Directed against the defects of traditional strategies that early materialization restructures a large number of unnecessary tuples and late materialization may re-extract the same column, this paper proposes a new materialization strategy, called Value-path Materialization Strategy (VPMS). We defined a new descriptor structure called "pass block" for the intermediate results during physical execution, which will be used to separate the location information to be restructured from the real values of the columns. In the light of the value-path, the values of the column are saved in the value area of the pass block, which can reduce the amount of the construction of independent tuples. Block iteration replaces the traditional tuple iteration to achieve the physical operators, which reduces the number of recursion and the depth of iteration. And it proposes execution custom technology to reduce the unnecessary duplication of judgments. All of these ultimately improve the efficiency of query execution.This paper has three parts. First, it studies the related technologies of column storage, and describes the architecture of DWMS and query processor, which is the premise of in-depth study; Second, it studies related technologies of query execution with column store, including iterator with block, materialized strategy, tuple reconstruction, compression execution and so on, which is the key; Finally, based on the previous research, it describes the design and implementation of execution engine in detail, which is the result.
Keywords/Search Tags:column storage, data warehouse management system, query execution, tuple reconstruction, block iterator, execution custom
PDF Full Text Request
Related items