Font Size: a A A

Research And Implementation Of Construction Algorithms For Closed Histogram Cube

Posted on:2014-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhengFull Text:PDF
GTID:2308330473451247Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the universal application of the information science technology in production and practice, enterprises accumulate massive data every day, which makes the decision makers of enterprises more and more consider the decision making tools such as OLAP when they want to mine knowledge among the data. However, because of the big volume and high dimensions of the data, some aspects of OLAP are facing serious challenges especially the Data Cube technology. Data Cube is a kind of technology for data preprocessing and has a big drawback that its construction costs enormous time and space. To solve this problem, compression technologies such as Closed Cube are created and make Data Cube much smaller. However, deal with the volume and dimensions of data waiting to be processed, the cost of time and space are still big problems. On the other hand the flexibility of Data Cube is also poor, and the emergence of Histogram Data Cube latter can enrich it quite a lot. Cloud Times coming, MapReduce model and its open source implementation Hadoop being widely used, the methods of Closed Cube construction based on MapReduce are coming out. These methods can improve the computation ability with the power of PC clusters, so as to solve the challenges of constructing Closed Cube. However these methods have drawbacks on efficiencies especially when the data they are used to process is big and high-dimensional.Using the idea of decomposing the lattice, this thesis proposes a new method called MRC-Cubing over MapReduce to construct Closed Data Cube, which divides the enormous task of the computation of the entire Closed Cube into many embarrassingly parallel small tasks. Compared to the formerly MapReduce methods, it can improve the efficiency of the computation of Closed Cube a lot, and can hold the situation in which data is high-dimensional. Also this thesis provides a solution to load unbalancing of this MapReduce based method:adjust the rules of hashing to balance the numbers of tuples of border sub-closed cubes and decompose further the border sub-closed cubes to process those which are too big. This method is also appropriate for the construction of Closed Histogram Cube.The construction of Closed Histogram Cube costs a lot. So when the closed histogram cube of a dataset is created and new data arrives and accumulated to a volume with time going on, it is better to aggregate the new data into the existed data cube instead of constructing a new closed cube totally from the start. Based on the analysis of profits and costs brought by making use of the existed closed cube during incremental updating, this thesis proposes a method to make this kind of updating for existed closed histogram cubes in the single-machine environment. This method can use much less time to incrementally update a batch of data into the existed closed cube compared to totally constructing a new one.This thesis makes evidences by experiments on the TPC-DS benchmark datasets for that MRC-Cubing has higher efficiency when processing big and high-dimensional data compared to the formerly algorithms over MapReduce and can use the power of PC clusters better, and that the incremental updating method proposed in this thesis also has higher efficiency over constructing a new closed cube from the start.
Keywords/Search Tags:OLAP, Closed Data Cube, Histogram Data Cube, MapReduce, Incremental Updating
PDF Full Text Request
Related items