Research And Implementation Of Construction And Query Techniques Of Histogram Data Cube Based On Hadoop

Posted on:2013-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:J C Yi

Full Text:PDF

GTID:2268330425997136

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the wide application of information techniques in the age of Internet, sources of information are increasing rapidly and the size of data is in a fast growth trend. In the face of so much data information, more and more companies and organizations start to focus on data storage application. Data warehouses are used widely in this area. For most companies and organizations, the data analysis based on data warehouse is playing an increasingly important role in decision making. Meanwhile, processing on big data needs high computing and storage power. When the traditional PCs meet bottlenecks on that, cloud computing and related techniques offer support on solving the problem. How to organize data efficiently and complete analysis processing on big data has been a hot research topic.In this thesis, we find some limitation of query techniques on processing big data by particular analysis on current OLAP techniques. We propose a new multi-dimension aggregation general model and analyze the feasibility on OLAP. Using MapReduceâ€™s parallel processing power and HDFSâ€™s big storage power, we complete the construction and storage of histogram data cube. In this model, we design and implement classic aggregation algorithms such as sum, count etc on OLAP. Besides, we also implement the algorithms that the traditional OLAP doesnâ€™t support such as mode, median. These algorithms are implemented based on MapReduce so that we can efficiently use the computing power of Hadoop to complete analysis processing. To solve the data updating problem on big data, we also propose an incremental update solution. Through the above methods, the computational efficiency on big data analysis is higher and more kinds of analysis are supported.According to the approximate query on data warehouse, the thesis proposes a new histogram partition method based on histogram model. We discuss the inaccuracy and storage loss and redesign the aggregation algorithms to compute approximate result. This method helps to effectively reduce the aggregate query computing time and response time thus to support the approximate query requirement from users.

Keywords/Search Tags:

Hadoop, Data Cube, OLAP, Histogram, Approximate Query

PDF Full Text Request

Related items

1	Design And Implementation Of Closed Histogram Cube Based On Hadoop
2	Research And Implementation Of Histogram Cube Compressed Storage And Incremental Updating And Query Under Cloud Environment
3	Research And Implementation Of Construction Algorithms For Closed Histogram Cube
4	Research On Data Cube Technology Based On MapReduce
5	Cache Research OLAP Cube-based Suppliers And Implementation
6	Research And Implementation Of Online Multiple Aggregation Query System Over The Big Data
7	Research On Approximate Query Processing Techniques In The DataWarehouse
8	An Algorithm About Cube For OLAP Query Based On Partition
9	Research On Distributed OLAP Query Optimization Based On Hive
10	On-Line Analytical Processing (OLAP) & OLAP Application In Commercial Automation