Font Size: a A A

Research And Implementation Of DWMS Compression Technology

Posted on:2012-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z X WangFull Text:PDF
GTID:2178330332985983Subject:Computer applications
Abstract/Summary:PDF Full Text Request
As the growth competition on the market, the requirements for information are going up day by day. It is extremely important to extract required information from massive data for making decisions. However, traditional databases are unable to meet the demands. Data warehouse which is a special and decision-supported technology for data storage, has come as the time required.Current data warehouse system are mostly based on database management system (DBMS), and there is no specific data warehouse management system (DWMS). With the continuous development of data warehouse technology, data warehouse contains a sharp increase in the amount of data, the traditional way of storage mode query posed a challenge. To improve the read optimization (read-optimized) system performance, people began to consider a new method different from the traditional way of storing different storage-column storage. Data storage technology column table is stored as a unit, the data table records in the same property values are stored together. When making inquiries, the column to store the data warehouse requires only the columns into memory, to some extent reduce the amount of data read, making the system the query more efficient. However, the amount of data which the data warehouse need to be process is the very large, which caused a large number of query I/O. CPU processing and disk access as uneven development, resulting in the I/O as the bottleneck of the query. Therefore, reducing the number of I/O number can significantly improve query efficiency. The data compression can reduce the number of I/O, therefore, data compression has become a research hotspot. For column storage, the data have the same type of data, increasing the similarity between adjacent data, making the traditional storage system has better compression efficiency. Compression technology is an important research field in column-oriented management system. However, most previous compression techniques for column-oriented data use same algorithm for all columns, ignoring the local distribution of data, which greatly degrade the compression performance. In this paper, we propose a sector-based compress pattern, under such pattern we further provide a novel learning-based compression strategy selection method for column stores. First, data column is divided into sectors in our method. We respectively extract the neighbor sector information and the statistic information of the column with the given sector as two references. Then we propose to learn the similarity between the reference and the given sector to obtain the recommended compression strategy. Finally, we improve the recommended compression strategy by partly learning the given sector to guarantee the effectiveness of it. The experimental results on data warehouse benchmark data set SSB testify the effectiveness of the proposed method.
Keywords/Search Tags:Column store, Data compression, Sector-based Compression, Compression Strategies
PDF Full Text Request
Related items