Font Size: a A A

Research On The Processing Of Multidimensional Data And Related Querying Technology In Data Warehouses

Posted on:2006-06-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z H LiFull Text:PDF
GTID:1118360182956838Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data warehouses serve as the data sources for business intelligent activities, such as reporting, forecasting and multidimensional analyzing. In a data warehouse, there are large amounts of historical data to support the analyses of managers. OLAP (On-Line Analytical Processing) organizes the data in data warehouses with cubes, which are composed of dimensions and facts. Moreover, OLAP systems support decision-making processes by providing dynamic analytical operations on the high volumes of data in data warehouses, such as drill-down and roll-up. Currently, there still many problems to solve, such as modeling of multidimensional data, processing of irregular dimensions (modeling or transforming), improving the range queries performance in OLAP, providing uniformed modeling, design and developing methodological foundations of ETL tools, etc. Pre-aggregation is one of the most effective techniques in OLAP to ensure quick response to user queries in data warehouses. The premise underlying the applicability of pre-aggregation is that lower-level aggregates can be reused to compute higher-level aggregates. However, in real-world applications, the complicated structures of some irregular dimensions make it very hard to ensure the premise. Irregular dimensions can lead to the following four problems: (a) minimal members might occur in the domains of different levels; (b) a level might has more than one parent level; (c) there might be some hierarchies in a dimension that have different length; (d) the mapping from the domain of a child level to that of a parent level might be a partial mapping. As to the performance of range-sum queries, current techniques, such as RPC, HDC, DDC, SDDC, DPRC, can only support the range-sum queries on the cubes of which the dimensions have continuous domains, whereas they can do nothing to those range-sum queries on the cubes of which the dimensions have discrete domains. To solve the problems above, in this thesis, we mainly focus on the modeling and transforming of irregular dimensions, the improving of the performance of the range-sum queries, and the conceptual modeling of ETL processes. Our contributions consist of following four parts. (1) To solve the problem caused by irregular dimensions, we propose a novel multidimensional data model. The model has two advantages: (a) it is compatible with other multidimensional models by defining the partial order among levels based on the mappings between them. Therefore, it is can be combined with various logical models and physical models so as to build data warehouses; (b) the cube model allows the members from different levels appearing in the same cube; thus it supports the modeling of cubes with non-onto dimensions. Example analyses show that the multidimensional model provides a feasible way to support both regular and irregular dimensions in real applications. (2) To use irregular dimensions, especially non-covering dimensions, in traditional multidimensional data models; we define four types of non-covering dimensions based on the combination of different types of irregular mappings between levels, namely, type A, type B, type C and type D. For the dimensions of type A, the algorithm MakeCoveringA is given, which transforms these dimensions into covering ones. Similarly, we propose an algorithm MakeCoveringB to transform the dimensions of type B into covering ones. Moreover, the algorithm MakeSelfOnto we proposed can transform the self-into mappings into self-onto mappings. Furthermore the algorithm MakeOnto is given to transform the non-onto mappings. With the combination of MakeCoveringA and MakeCoveringB, we can transform the dimensions of type C into covering ones. In addition, with the combination of MakeCoveringA, MakeCoveringB, MakeSelfOnto, MakeOnto, a dimension of type D can be transformed into a regular one. At last, we define a calling priority order for them, the priority of the sequence: MakeSelfOnto, MakeCoveringB, MakeCoveringA, MakeOnto decreases gradually. (3) The efficiency of the range-sum queries is lower in OLAP system, and most of the existing technologies are not sufficient enough to deal with range-sum queries on the data cubes. In order to improve above problems, also the update and growth of the cubes, we give a new structure called Nested Dynamic Cube Tree (NDC-tree). NDC-trees support the update, insertion, split and extension of nodes. Moreover, the NR-tree does not require dimension domains to be continued; consequently, it can support the growth of the data cubes of which some dimension domains are discrete. Then some algorithms based on NDC-trees are given. The algorithm CalRangeSum is used to compute the results of the range-sum queries on cubes, and the algorithm InsertCell is used to support the update and insertion of cells. Example analyses show that NDC-trees are feasible in supporting range-sum queries on the cubes of which the domains of some dimensions are discrete. (4) To model the ETL processes, this thesis proposes a new conceptual model based on CommonCubes, with which we represent the structures of cubes. The model is more powerful than other ETL models in expressing the multidimensional characteristics of OLAP data. Meanwhile, as a middle object, CommonCube iscompatible with various data warehouse models. In addition, it releases the design of ETL processes from overdependence on the target DWMS. CommonCube enables designers to concentrate more on data transforming than data loading when designing ETL processes. Moreover, the conceptual model can be combined with other ETL logical models so that to develop powerful ETL tools.
Keywords/Search Tags:Data Warehouse, On-Line Analytical Processing, Cube, Irregular dimension, Transforming algorithm, range-sum query, ETL, conceptual model
PDF Full Text Request
Related items