Font Size: a A A

Research And Implementation Of Compression Technology In Column-Oriented Data Warehouse

Posted on:2014-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:B T LongFull Text:PDF
GTID:2248330395980924Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
As information has become one of the key aspects of enterprise survival and development, it is more significant to extract and analyze information from huge amounts of data to support decision-making. Data warehouse as an important analysis tool for massive data arouses more attention.Nowadays, the traditional row-oriented database management systems have been unable to adapt to the efficient analytic queries. The column-oriented database storage architecture receives more attention. Under the application environments such as analytical query in data warehouse or business intelligence, column-oriented database storage architecture can avoid reading irrelevant columns during query execution, which has more advantages than row-oriented database.Disk I/O is the main bottleneck during the data query in data warehouse which will has high time cost. Reducing the amount of I/O can improve the efficiency of the data query significantly. Column-store technology which stores data with same data type increases the similarity between the adjacent data. Therefore, data warehouse using column-store technology has better data compression efficiency than the one using traditional row-store. So, data compression is one of most important topics in the column-oriented data warehouse management system.Based on characteristics of the column-oriented data warehouse management system, this paper provides the design and implementation of the compression model; provides the design and implementation of the decompression and the execution on compression data scheme in column-oriented data warehouse management system. Then it proposes an improved version of the classic data compression algorithm, which is the simple-dictionary encoding based on dynamic dictionary. The method provided in this paper combines column-level dictionary with sector-level dictionary and counts the probability of occurrence of every data value in each sector, which supports the establishment of streamlined lightweight column-level dictionary. So the compression ratio and the query performance are improved. At last, the experimental results given are used to verify the effectiveness of the proposed method on the data warehouse benchmark data set SSB.
Keywords/Search Tags:data warehouse, column stote, data compression
PDF Full Text Request
Related items