Font Size: a A A

Study On The Construction Of Data Cubes Supporting Efficient Queries

Posted on:2009-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:F L LengFull Text:PDF
GTID:1118360308978428Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the digital technology and the popularization of the computer applications, many enterprises and organizations have been using computers and other related information techniques to manage their data. The computers have strong abilities of collecting, storing, and processing data. The data collected and accumulated from operational systems year by year, such as production monitoring data, medical data, vital statistics data, finance and economics data, and marine data, are the assets of the enterprises. How to efficiently and effectively manage the data and mine out patterns from the data to support decision making of production and marketing becomes more and more important. So, the data warehouse technology occurred. The data warehouse technology and the applications based on data warehouses are the hot topics of academic and industry communities. With the coming of network age, the fast development of the network has changed and continues to change the lives and the thinking manners of people. Every one and every enterprise can make the decision using the information resources all over the world. People not only search and access data but find information and knowledge in data to support decision making. The technologies of OLAP and data mining based on data warehouse can provide means for gaining information and knowledge.The data warehouse and OLAP technologies are all based on a multidimensional data model. The multidimensional data mode is a concept model facing analysis applications, which can express the analysis goals directly. This model views data in the form of a data cube. The finally aim of the date warehouse and OLAP are serving the decision making, which requires the fast and exact response to the queries on data cubes. So the construction of data cubes becomes important. This dissertation studies the construction of data cubes and the related technologies. It is focused on the discussions about the following issues:Firstly, this dissertation proposes a dynamic materialized view selection algorithm based on query patterns (BQP) aiming at the materialized views selection problems in current data warehouse systems. In the BQP algorithm the selections and the adjustments of the materialized views consider the space limitation and the users'query patterns before. Every view has a weight. The more the view is accessed, the bigger the weight is, and the higher probability the view is materialized. Compared with the traditional materialized views selection algorithm, the hit rate of the BQP algorithm is improved greatly.Secondly, the dissertation proposes an indexing technique consisting of a compressed bitmap index and two algorithms for cube constructing and querying, which deals with the high dimensional and low cardinality datasets. The bit-AND operation based on the compressed bitmap index is very fast; the introducing of the start valid pointer and the end valid pointer can greatly reduce the operations of bit-ANDing and the memory consumption. Compared with the Frag-Cubing algorithm, the computation time of the algorithm based on compressed bitmap index is saved by 30%, and the storage space is saved by more than 25%.Thirdly, the dissertation presents two novel clustering algorithms based on paging partition strategy of Dwarf, which is a highly compressed cube construction algorithm by eliminating the prefix and suffix structural redundancies, to speed up queries. The recursion clustering algorithm optimizes the point queries, which search the Dwarf with the manner of depth-first search. The hierarchical clustering algorithm optimizes the range queries, which search the Dwarf with the manner of breadth-first search. A logical clustering mechanism is designed to facilitate updating and maintain the clusters. Compared with Dwarf, the recursion clustering algorithm is suitable for point queries, and the hierarchical clustering algorithm is suitable for range queries, and the performance of response time and I/O times are both improved.Fourthly, a new data cube model facing the dynamic generalization analysis is proposed, which is to implement the dynamic generalization analysis queries. Data generalization is a process that abstracts a large set of task-relevant data from a relatively low conceptual level, such as the values of ages, to higher conceptual levels, such as youth, middle age, and agedness. Since the traditional model can not do the dynamic generalization analysis queries flexibly and can not pre-computed materialized views to accelerate the answers of the dynamic generalization analysis queries. The new model extends the definitions of the dimensions and the fact tables in the traditional model in order to get over these shortcomings. The new model is better than the traditional model in the response time, the satisfaction degree of users and the flexibilities for the dynamic generalization analysis queries.Finally, based on the analysis of the characteristics of point queries and range queries on Dwarf and the disk system of Windows, a user-defined I/O buffer mechanism to speed up queries on data cubes is proposed. In the system based on the user-defined I/O buffer the dimensions are re-selected, and the nodes often be queried are placed in the user-defined buffer in order to speed up the queries. The user-defined buffer can improve the performance greatly.In conclusion, the dissertation studies the construction of data cubes and other related issues. A dynamic materialized view selection algorithm based on query patterns, a compressed bitmap index, and two novel clustering algorithms based on paging partition strategy of Dwarf are proposed. A new data cube model facing the dynamic generalization analysis is presented. These contributions solve the problems of construction and querying on data cubes. Lots of theoretical analysis and experiments show that the algorithms and the methods are efficient and effective. These algorithms and models will be the well foundations for the construction and querying on data cubes in data warehouses for the future. And these approaches and techniques could make some contributions to the construction and development of the decision supporting systems based on data warehouses.
Keywords/Search Tags:data warehouse, data cube, OLAP, materialized views, bitmap index, clustered Dwarf, generalization analysis, user-defined buffer
PDF Full Text Request
Related items