Font Size: a A A

Research And Implementation Of Construction And Query Techniques Of Histogram Data Cube Based On Hadoop

Posted on:2013-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J C YiFull Text:PDF
GTID:2268330425997136Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the wide application of information techniques in the age of Internet, sources of information are increasing rapidly and the size of data is in a fast growth trend. In the face of so much data information, more and more companies and organizations start to focus on data storage application. Data warehouses are used widely in this area. For most companies and organizations, the data analysis based on data warehouse is playing an increasingly important role in decision making. Meanwhile, processing on big data needs high computing and storage power. When the traditional PCs meet bottlenecks on that, cloud computing and related techniques offer support on solving the problem. How to organize data efficiently and complete analysis processing on big data has been a hot research topic.In this thesis, we find some limitation of query techniques on processing big data by particular analysis on current OLAP techniques. We propose a new multi-dimension aggregation general model and analyze the feasibility on OLAP. Using MapReduce’s parallel processing power and HDFS’s big storage power, we complete the construction and storage of histogram data cube. In this model, we design and implement classic aggregation algorithms such as sum, count etc on OLAP. Besides, we also implement the algorithms that the traditional OLAP doesn’t support such as mode, median. These algorithms are implemented based on MapReduce so that we can efficiently use the computing power of Hadoop to complete analysis processing. To solve the data updating problem on big data, we also propose an incremental update solution. Through the above methods, the computational efficiency on big data analysis is higher and more kinds of analysis are supported.According to the approximate query on data warehouse, the thesis proposes a new histogram partition method based on histogram model. We discuss the inaccuracy and storage loss and redesign the aggregation algorithms to compute approximate result. This method helps to effectively reduce the aggregate query computing time and response time thus to support the approximate query requirement from users.
Keywords/Search Tags:Hadoop, Data Cube, OLAP, Histogram, Approximate Query
PDF Full Text Request
Related items