Research On Parallel And Distributed Processing Technology Of Data Cube In OLAP System

Posted on:2008-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:Q Gu

Full Text:PDF

GTID:2178360215474792

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data cube is a technique that can access the data in the Data Warehouse instantly. It's also the main subject of On-Line Analytical Processing (OLAP). Decision analysts can not only enjoy highly efficient data access in data cube, but also have quick access to useful decision information.In this paper, we present a cube storage and OLAP query system for high dimensional data in the parallel environment. In addition, we also present a cube storage and OLAP query system for massive trace data based on the net environment. Furthermore, we give the prospect of our further research work in this area.As the size of data warehouse grows, the dimension and its hierarchical structure of the cube become more and more complicated. As far as the computation time and storage space are concerned, it is immensely expensive to materialize the whole cube in a single processor. In spite of the adoption of various improved computation methods and the compression methods such as Iceberg Cube, Condensed Cube and Dwarf, they cannot solve the storage problem of high dimensional data fundamentally. Parallel computation provides new insights into this problem.In order to avoid the"dimension disaster"caused by the high dimension data, we present a highly efficient storage structure based on the parallel environment—HDCube (High Dimensional Cube). HDCube segments the high dimensional dataset into a set of disjoint low dimensional datasets according to the number of processors. Then, by using the parallel processing technique, we compute the LDCube (Low Dimensional Cube) belong to different processors. Meanwhile, with the hierarchy characteristics of dimensions, we make use of the index technique based on the DHE (Dimension Hierarchical Encoding) to generate each dimension's DHE table to substitute the original keywords in the dimension table. It compressed the size of the keywords of the dimension and accelerated the speed of data retrieve in the cube simultaneously. We set up the HDCube storage and OLAP query system in the parallel environment. It builds and updates the HDCube in parallel; meanwhile, it presents the algorithms of parallel query and optimization. The theoretical analysis and experiments show that the performance of HDCube in the parallel environment has high efficiency and speedup compared with the traditional whole materialization and partial materialization methods.Besides the traditional application fields such as finance, insurance and telecom OLAP has a wide development in the logistics as well. With the combination of the massive trace data induced by the dangers'transportation, we present a new cube storage structure-MTCube (Massive Trace Cube). MTCube is fit for the ubiquitous massive trace data in the applications. It compressed data dramatically and record the trace by adding hierarchy prefix trace and hierarchy prefix id in the storage structure at the same time, which can help the users track the dangers as well as analyse the massive data efficiently.Considering that there are numerous geographically distributive trace data sources that require efficient processing and analysis, the original centralized management can no longer meet the customers needs, we propose the MTCube storage and OLAP system based on the net environment, which is composed of several LCOS ( Local Cube/OLAP System)located at the local station. Both theoretical analysis and experiments prove that MTCube support the efficient control and risk prediction of dangers as well as the highly compression of the massive trace data.

Keywords/Search Tags:

Cube, OLAP, Parallel computation, Distributed processing

PDF Full Text Request

Related items

1	Research And Application Of OLAP On Macroeconomic Intelligent Decision Support System
2	On-Line Analytical Processing (OLAP) & OLAP Application In Commercial Automation
3	Optimization and gereralization of OLAP cube processing in relational database systems
4	Multidimensional Data Model For Mining And Analysis Based On Multiple Structure Data Cube
5	Efficient Data-Cube Computation And Application In OLAP MINING
6	Bank Decision Support System And Implementation Of Olap Technology
7	Research On Fast Data Cube Computation Method Based On Spark Platform
8	Research On OLAP Query Technology Based On Distributed Memory
9	Cache Research OLAP Cube-based Suppliers And Implementation
10	Research And Implementation Of Data Cube Techniques For OLAP Analysis Of Network Security