Font Size: a A A

Research And Optimization Of Multidimensional Data Warehouse Model Based On Column Storage

Posted on:2017-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:X XuFull Text:PDF
GTID:2308330485970922Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the ear of big data, the amount of data in the data warehouse has increased dramatically. The traditional data warehouse that based on row storage is facing great challenges when dealing with massive data. Because the column storage system will store the aggregation of the same attribute data, which can effectively reduce the query independent column read, so it is very suitable for the query intensive system like data warehouse. So in recent years, a lot of enterprise data warehouse system is gradually changing from row to column storage mode. At present, some research results have been achieved in the field of column storage technology, but there is still a need for further research on the model optimization, data compression and so on.We always use multidimensional model to build data warehouse, and the query based on multidimensional model often relate to the association between dimension tables and fact tables. Under the distributed column storage, each attribute of the dimension table and fact table will be assigned storage to different nodes, it will not only damage the integrity of the information level on the dimension table, but also will introduce a large number of data migration when the fact table and the dimension table need associated, so those will reduce the performance of the system. So in order to eliminate the association between a dimension table and fact table and maintain the integrity of hierarchical information dimension table, this paper referring to the idea of universal relation model, we use local dimension hierarchical encoding and global dimension hierarchical encoding to encode the level information of the dimension table, which can compress the level information of dimension tables and form a join-free star schema, it both to preserve the dimension table level information integrity and the fact table data is processed independently and thus more applicable to distributed column storage system.Due to the unbalanced of CUP processing and the development of disk I/O technology, the disk I/O become the system’s biggest bottleneck when dealing with the massive data. The data compression is an effective means to decrease the system of I/O, and the column storage mechanism of the same attribute data storage, which increase the similarity between adjacent data and making the data more easily compressed. Therefore this paper unifies the optimization model of the organization of data characteristics, proposed composite compression strategy which use simple dictionary coding, run length coding, bitmap encoding, prefix code, empty value compression and the LZ encoding, we use the composite compression strategy to compress the data which saving the large space and to further enhance the performance of the system.In this paper, the design optimization method and the compression strategy are tested and evaluated on the Teradata parallel database platform. The experimental results show that the proposed optimization method and the compression strategy can effectively improve the performance of the system.
Keywords/Search Tags:Data Warehouse, OLAP, join-free star schema, column store, data compression
PDF Full Text Request
Related items