Font Size: a A A

Research On Parallel And Distributed Processing Technology Of Data Cube In OLAP System

Posted on:2008-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q GuFull Text:PDF
GTID:2178360215474792Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data cube is a technique that can access the data in the Data Warehouse instantly. It's also the main subject of On-Line Analytical Processing (OLAP). Decision analysts can not only enjoy highly efficient data access in data cube, but also have quick access to useful decision information.In this paper, we present a cube storage and OLAP query system for high dimensional data in the parallel environment. In addition, we also present a cube storage and OLAP query system for massive trace data based on the net environment. Furthermore, we give the prospect of our further research work in this area.As the size of data warehouse grows, the dimension and its hierarchical structure of the cube become more and more complicated. As far as the computation time and storage space are concerned, it is immensely expensive to materialize the whole cube in a single processor. In spite of the adoption of various improved computation methods and the compression methods such as Iceberg Cube, Condensed Cube and Dwarf, they cannot solve the storage problem of high dimensional data fundamentally. Parallel computation provides new insights into this problem.In order to avoid the"dimension disaster"caused by the high dimension data, we present a highly efficient storage structure based on the parallel environment—HDCube (High Dimensional Cube). HDCube segments the high dimensional dataset into a set of disjoint low dimensional datasets according to the number of processors. Then, by using the parallel processing technique, we compute the LDCube (Low Dimensional Cube) belong to different processors. Meanwhile, with the hierarchy characteristics of dimensions, we make use of the index technique based on the DHE (Dimension Hierarchical Encoding) to generate each dimension's DHE table to substitute the original keywords in the dimension table. It compressed the size of the keywords of the dimension and accelerated the speed of data retrieve in the cube simultaneously. We set up the HDCube storage and OLAP query system in the parallel environment. It builds and updates the HDCube in parallel; meanwhile, it presents the algorithms of parallel query and optimization. The theoretical analysis and experiments show that the performance of HDCube in the parallel environment has high efficiency and speedup compared with the traditional whole materialization and partial materialization methods.Besides the traditional application fields such as finance, insurance and telecom OLAP has a wide development in the logistics as well. With the combination of the massive trace data induced by the dangers'transportation, we present a new cube storage structure-MTCube (Massive Trace Cube). MTCube is fit for the ubiquitous massive trace data in the applications. It compressed data dramatically and record the trace by adding hierarchy prefix trace and hierarchy prefix id in the storage structure at the same time, which can help the users track the dangers as well as analyse the massive data efficiently.Considering that there are numerous geographically distributive trace data sources that require efficient processing and analysis, the original centralized management can no longer meet the customers needs, we propose the MTCube storage and OLAP system based on the net environment, which is composed of several LCOS ( Local Cube/OLAP System)located at the local station. Both theoretical analysis and experiments prove that MTCube support the efficient control and risk prediction of dangers as well as the highly compression of the massive trace data.
Keywords/Search Tags:Cube, OLAP, Parallel computation, Distributed processing
PDF Full Text Request
Related items