Font Size: a A A

Research On High Performance Data Cube And Its Semantics

Posted on:2011-12-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z B ShiFull Text:PDF
GTID:1118360302470477Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data cube is the main means to implement On Line Analytical Processing (OLAP). With sharply increasing of number of data dimensions and data scale, operation cost on it is very high and optimization should be done for data cube. The current technologies of data cube include partially materialization, index methods, approximate query, data compress, data reduction, online aggregation and so on.The Formal Concept Analysis (FCA) is a mathematical tool for analysis based on formal concept and concept hierarchy and has been widely used in the fields of knowledge discovery and data analysis. A comparative examination of data cube lattice and formal concept lattice shows that each of them is based on order-structure and furthermore if base table in data warehousing is looked as a concept context, equivalent character groups corresponding to concept in FCA and cover equivalence classes of cells in data cube have the same partitions for data. The theories of FCA and concept lattice are introduced into the research of data cube in this paper firstly and research on high performance data cube and implementation of semantics in data cube are carried out in succession. Study shows that introduce of FCA theories into data cube can provide a new powerful analytical tool for data cube. And using this tool and beginning with data characteristics, high performance data cube that has simple structure, low cost and small size can be obtained. Semantics in data cube can be understood deeply and be realized easily. Some achievements are obtained as follows:(1)Data cube structures based on concept lattice are put forward.Firstly, correlations of concept lattice and data cube lattice are analyzed and concept lattice is applied to data cube lattice. Aggregate Concept and Aggregate Concept Lattice (ACL) are brought forward in this paper. ACL can contain all aggregations of cube integrally. As a same aggregate value possessed by several cells is recorded with one aggregate concept, ACL can reduce data cube in same ratio with queotient cube. In addition relations of generalization and specialization among aggregate concepts in ACL can express the hierarchical relations among cells after cube reduction and keep semantic relations of data cube.Secondly, based on ACL, Reductive Aggregate Concept structure (RAC) is proposed in this paper. Based on properties of G partial relations in theories of FCA and completeness of base table in data warehousing, the one to one corresponding of object concepts derived from base table in ACL and tuples in base table is found. So base table can be looked as set of all object concepts. All object concepts and special (Ω, null) are removed based on ACL in RAC. Combined with base table, RAC is still a fully pre-computed cube and can realize reduction of data cube in more high ratio compared with ACL and quotient cube. And in RAC hierarchical relations among non-object concepts can be still preserved.Thirdly, based on properties of M partial relations in theories of FCA, an efficient method of query answering in ACL and RAC is proposed. The search path in ACL and RAC when query can be determined using intent of attribute concept m" avoiding whole scope searching and query can be achived effectively.Lastly, formal context is discussed and attribute reductive theories of concept lattice are applied to research of data cube. The base table can be reduced and raletive operations of data cube can be simplified through removing unnecessary attributes and combining relative necessary attributes.(2) The attribute implicit relations in formal context are studied and based on relational system, Reductive Aggregate Concept structure based on Attribute Implication (RAC-AI) is proposed in this paper.According to theories of FCA, two kinds of nontrivial attribute implication which describes concept lattice are discussed. One is attribute implication in which preconditions are real-preconditions and the other is attribute implication in which preconditions are quasi-intensions. The intent of concept can be obtained through attribute implications but not concept lattice. Based on RAC two structure of RAC-AI that preconditions are real-preconditions and quasi-intensions are put forward. In RAC-AI discarding the complex structure of concept lattice, the table of attribute implication is added that record all nontrivial attribute implication and all non-object aggregate concepts are stored using mainstream relational system. Both theories and experiments show that RAC-AI has simple structure and small cube size. The costs of construction and maintenance incrementally are low, too. Furthermore, speed of response for query is high. So RAC-AI is a good structure of cube with high integrative performance.(3)The semantic relations among cells in data cube are more important for efficient query and OLAP. In this paper implementation of semantic operation in data cube based on theories of FCA and concept lattice is studied.Firstly, purification and reduction of formal context are discussed to remove redundant information. The current semantic research did not consider to reduct data themselves and plenty of redundant information left over in data disturb semantic finding and understanding.Secondly, based on properties of M partial relations in theories of FCA, M partial relations can be used as heuristic rules for concept hierarchy and semantic concept hierarchy can be obtained at attribute level but not dimension level liking in current methods.Thirdly, using properties of M partial relations and nontrivial attribute implication in theories of FCA, semantic meaning of roll-up and drill-down among data cells is realized. Through analyzing properties of upper bound and lower bound of equivalent character groups, computational methods of upper bound and lower bound are proposed to obtaine the structure of equivalent character groups and then obtain semantic meaning of roll-up and drill-dwon among data cells whose aggregates are same in cube. Based on properties of M partial relations and nontrivial attribute implication, intent of parents and children concept of any concept can be obtained and then obtain semantic meaning of roll-up and drill-down among data cells whose aggregates are different in cube. This method avoids depending any special structure and achieves roll-up and drill-down operations starting from any data cell. Roaming can be done in data cube through repeating this process and full data cube is not necessary. The current methods of semantic data cube were achieved at view level commonly or achieved at cell level depending on complex special structure.(4) Range query is a very effective tool to analyze data for Multi-dimensional data cubes. Pre-computing is one mean to improve range query. This paper work on partition scheme and analyze how to organize cells and carry on prefix sum in partitions and propose a Prefix Region data Cube structure (PRC). PRC partitions data cube into several irregular boxes in favor of pre-computing of prefix region. Technology of recursive partition is used to partition data cube and organize cells of partitions to carry prefix sum in PRC. For data cube with d dimensions (suporsed that the cardinality of each dimension is n) without adding any space overhead compared to storing the original array both of the range query and update costs of PRC are O(lgo~dn).
Keywords/Search Tags:Data Warehousing, Data Cube, Aggregate, Formal Concept Analysis, Concept Lattice, Attribute Implication, Semantics, Range Query
PDF Full Text Request
Related items