Font Size: a A A

Research And Implementation Of Query Optimization In Column-Oriented Compressed Data

Posted on:2012-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2178330332486260Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional write-optimized database management systems store data by rows. However, column-oriented technology shows a significant performance for read-optimized database management systems since it stores data table in the unit of column. This could be used to effectively avoid reading irrelevant columns in the process of query, therefore, achieve a high efficiency. As the amount of information stored in data warehouse is very large, how to manage these data effectively is a enormous challenge for data warehouse management system. An effective way to solve this problem is to use data compression technology. Therefore, it becomes very meaningful for the research on how to query and optimize column-oriented data which has been compressed.The main tasks in this paper, based on the research project of "DaMeng" laboratory---"The research and implementation of DWMS prototype system", are presented as follows:1) Research on a current dictionary-based Order-preserving string compression method, try to improve various deficiencies of original compression method while continue using its innovative index structure shared leaves. Propose a new probability-based order-preserving string compression method which could quickly compress and decompress the property of string, therefore, reduce the query time of data compression for the system.2) Research on the query strategy after column-oriented data were compressed. Modified the traditional cost model, the cost of compression and decompression will be added to it, it is based on the compression. In order to minimize the CPU consumption, analyze how to make a reasonable choice for data decompression while it is unavoidable, then this article provides a series of specific algorithms on compressed data queries which include:the select, join, aggregation operation of data compression.3) According to the characteristics of compressed column-oriented data and combined them with the existing database query optimization. This paper proposes a number of optimization strategies for the query of compressed data. The reasonable selection of index structure is proposed in order to facilitate fast data retrieval whereas the use of temporary table is proposed as well in order to improve query speed. After this, optimized rewriting strategy is implemented for predicate and subquery which often exist in the query statement. At last, optimized algorithm for selection and aggregation operations on compressed data is presented.4) After implementing probability-based Order-preserving string compression method which is proposed in this paper on the string property of DWMS, the compression efficiency of strings is compared. Then some of the query optimization strategies are applied on the compressed DWMS, a series of experimental tests are ran as well through a variety of optimization strategies and optimized query execution time.Probability-based Order-preserving string compression method proposed in this paper could decompress the string property quickly and reduces query time of the system, therefore, achieve the purpose of query optimization. Meanwhile, some of the query strategies proposed in this article make it possible for the query running on compressed data directly without decompression, finally, achieve the query optimization of compressed data.
Keywords/Search Tags:column-oriented, data compression, data decompression, query optimization
PDF Full Text Request
Related items