Font Size: a A A

Research Of Distributed Data Cube Partial Materialization Method Based On Genetic Algorithm

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:T T LuoFull Text:PDF
GTID:2348330533450146Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of cloud computing, big data and other related technologies, the data in many areas have increased on exponentially. It is an important task in data warehouse field to analyze and process the data, so as to obtain potential valuable information. On-line analytical processing provides support for obtaining potential information through observing and analyzing the data from different angles and levels. But for the large scale and fast increasing data, it is hard for the traditional stand-alone mode with limited computing and storage resources to analyze and process and it is also too time consuming to meet the requirements of application. As a kind of technology of improving the performance of on-line analytical processing query, the data cube materialization is widely used in all kinds of data warehouse.Although data cube materialization can speed up the on-line analytical processing query response speed, materialized all the data cube needs a lot of computing resources, storage space, and maintenance costs. Therefore, it is usually efficient to choose some cuboids of the data cube to materialize. There are two main problems in the partial materialization. One is how to select the partial materialized views reasonably over massive data, the other is how to adjust the materialized views with the query changes.Aiming at these problems, a parallel selection method of materialized view based on genetic algorithm and an adaptive adjustment algorithm of materialized view set by the query log are proposed in this paper. The first method makes full use of genetic algorithm's potential parallelism and powerful global search capability, and transforms the materialized view selection problem into the problem of using genetic algorithm to get the optimal solution. Considering the response time and storage costs in the process of data cube materialization, MapReduce parallel computing framework is introduced to improve the performance. The adaptive adjustment method uses a fixed query times as the trigger condition and analyzes the query changes according to query statistics method, then, adjusts the materialized view adaptively. Experimental results show that these methods adapt to the big data computing environment, the materialized view selection method can select reasonable materialized views and the adaptive adjustment algorithm can adjust the materialized view adaptively.
Keywords/Search Tags:data cube, partial materialization, genetic algorithms, MapReduce, adaptive adjustment
PDF Full Text Request
Related items