Font Size: a A A

Query Optimization Using Materialized Views Based On Evolutionary Computation In Data Warehouse

Posted on:2011-05-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:1228330332982900Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
A data warehouse is, by definition, a subject-oriented, integrated, time-variant collection of data, with the purpose of efficiently supporting decision support queries. As data warehouse keep growing in size, these complex decision queries which involve large volumes of stored data are very time-consuming. Inefficient data processing means of waste of resources, therefore, higher efficiency of data query is very important to data warehouse, also is a major goal in designing data warehouse.In order to improve query performance, it is an effective method to use auxiliary data to answer queries. In modern data warehouse systems, a common type of auxiliary data is materialized views:relations that are computed by answering certain queries on the (original) stored data in the database and that can be used to provide, without time-consuming runtime transformations, "precompiled" information that is relevant to the user query.To improve the efficiency of queries based on materialized views, there are two important issues:view-selection problem and optimal rewritten query using materialized views. This doctoral thesis focuses on these two issues and presents comprehensive solutions to both problems.View-selection problem:Given a set of queries to be supported, the view selection problem is to select a set of views to materialize minimizing the query response time given some resource constraint. Firstly, this thesis generally analyses the characteristic of view-selection problem in data warehouse system, the principle and the difficulty of the optimization, and malpractice of the traditional view selection methods, systematically introduces the characteristics and methods of estimation of distribution algorithms, and from the "macro" level, builds a mathematical model of population in the view-selection problem; meanwhile, two hybrid genetic algorithm are proposed, during the process of generate offspring, we can take advantage of the global statistical information and local information to overcome the shortcomings of GA and EDAs to more effectively address the complex view-selection problem. We ran some experiments to determine the quality of the solution delivered and the efficiency of the UMDA, GEDA and BMUTGA algorithms. Three algorithms named as UMDA, GEDA and BMUTGA are proposed in this paper, and ran them on the dataset in TPC-D benchmark and a variety of synthetic datasets. The experimental results show that the proposed algorithms outperform canonical genetic algorithm under different space constraints, different queries distributions and view size distributions.Optimal rewritten query using materialized views:This problem is, given a user’s query Q and a set of materialized views V, to find an optimal (equivalent) rewriting R of Q that is composed of views in V. Firstly, compared with the traditional query optimization, this thesis roundly analyses the characteristic of using materialized views to optimal (equivalent) rewriting in data warehouse system, the principle and the difficulty of the optimization, and malpractice of the traditional methods to the optimal query rewriting using materialized views, systematically introduce the characteristics of heuristic method and genetic programming, and analyzes the possibility with the heuristic method and genetic programming in optimal query rewriting using materialized views. Secondly, we propose two novel view-based optimal query rewriting algorithms BSHS and SRGP to address this problem, and make the following contributions. (1) The algorithms proposed in this paperunder bag-set semantic that have been widely used in today’s relational databases. Under this semantic, the base relations (i.e., data tables) do not contain duplicate tuples, while the query results may contain duplicate tuples. (2) The algorithms proposed in this paper for a large and practically important subset of SQL queries, which includes not only conjunctive (Select-Project-Join) queries but also aggregate (SUM/COUNT) queries. (3) The algorithms proposed in this paper explore multiple-view rewritings as well as single-view rewritings of the query to guarantee the approximate optimality of the rewritings they produce. This paper ran some experiments to determine the quality of the solution delivered and the time taken in practice by the heuristic method BSHS and rule-based genetic programming (SRGP) algorithms. We implemented both the algorithms, and ran them on a variety of synthetic datasets. The experimental results show that the proposed algorithms are effective.
Keywords/Search Tags:Data Warehouse, Materialized View, Query Optimization, Evolutionary Computation
PDF Full Text Request
Related items