Font Size: a A A

Design And Implementation Of Closed Histogram Cube Based On Hadoop

Posted on:2013-11-29Degree:MasterType:Thesis
Country:ChinaCandidate:B L LiFull Text:PDF
GTID:2268330425991968Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the advent of the information age, enterprises need to deal with more and more data, The key factor in deciding whether enterprise has better development is how to obtain accurate decision-making information from a large amounts of data. OLAP have been proposed to resolve this problem. The data cube appears in order to improve the performance of OLAP query. The aggregated value is pre-computed and stored to answer queries in the OALP applications. However, the pre-computation will lead to the size of data cube appear explosive growth. Therefore, it is necessary to research a technology that reduce data cube’s disk storage space and find a method that accelerates the speed of date cube computation and improves query performance in the data cube. Data cube computing technology is to compute part or all of the aggregate values in advance, but doing so has some defects. For example, we cannot aggregate analysis on the data cube with constraints, causing the data cube contains less information. IN recent years, enterprises are facing the challenge of massive data. When computing and storage data cube in a centralized environment, there are some defects both in computing power or storage capacity.To solve the above problems, we design and implement closed histogram cube based on in-depth study of closed data cube and Hadoop parallel computing platform. For the huge volume of full data cube, we uses the closed data cube technology and furthers compress the volume of data cube by reorganizing tuples. For less information, we proposed a general model for the multidimensional aggregation queries, this model replace a multi-dimensional space point values (a specific aggregate value) with a histogram. Because the original measure information has been effectively saved, the model solve the problems mentioned above For the huge amounts of data and histogram storage overhead we uses the closed data cube technology. MapReduce parallel computing model of Hadoop provides technical support for the calculation of the data cube, and the distributed file system HDFS of Hadoop has provided a guarantee for the storage of the data cube. Because multiple machines simultaneously compute closed histogram cube, the method greatly accelerates the speed of computation. IN order to speed up the query response and take advantage of parallel computing characteristics we presents a parallel inverted index method on a closed histogram cubic, and can greatly reduce query time based on the inverted index. We use two compression methods to the inverted index, it can effectively reduce the storage space of the inverted index and further reduce its scan time, so it can speed up queries. Based on the compressed inverted index, we present the method of index intersection in parallel environment. It accelerate the speed of intersection by Map-side intersect in parallel and Reduce-side intersect partly. Therefore it improves query performance.We design the closed histogram cube that can compress the data cube volume effectively. The effect is very obvious by using two compression methods on inverted index. The query based on compressed inverted index can quickly respond to online Aggregation analysis, and closed histogram cube also has good scalability.
Keywords/Search Tags:OLAP, Histogram Cube, Closed Data Cube, Closed Histogram Cube, Hadoop
PDF Full Text Request
Related items