| Along with the development of information age and the popularity of internet, there is an inevitable trend that the distributed massive data and storage data process between heterogeneous databases is very necessary. We can form a heterogeneous database system unified in logical while physically independent by the integration of data information and hardware devices of different databases. The search on various database and multi-table connections aim to ensure reliability, availability and system performance optimization of heterogeneous database through increasing data redundancy. However distributed storage of data in physical and the need of data redundancy would make the query processing more difficult. Therefore the query optimization of heterogeneous database plays an important role.In this paper, we analyzes the mode structure and query processing of heterogeneous database, knowing that in the optimization process of heterogeneous database system, it is necessary to convert global query to a global query tree and then resolve the tree to local query tree with equivalence rules in order to determine the final query execution plan. The query processing above should take local response time as well as transmission cost into consideration. Through learning from the query optimization algorithms of centralized database and distributed isomorphism database, we find out that the algorithms so far can only realize the shortest local response time and due to the lack of computation, it can fall into local optimal and cannot get global optimal according to the features of heterogeneous database when using dynamic algorithm to achieve query optimization.Aiming at this problem, the following work is done in this paper:1. Design the connection cost function, result operation cost function and transmission cost function of estimate model according to the query characters of heterogeneous database. It should consider connection cost that the key point of query optimization of heterogeneous database in this work. Then design global data dictionary and dynamic data dictionary based on the feature of genetic algorithm, test the feasibility and scientificity of this estimate model.2. Improve the traditional genetic algorithm, put forward a new coding form—coding based on the query tree, and equip the coded individuals with value structure. So the contents in dynamic data dictionary can be saved in coding by value structure. It ensures that there is no trivial solution generated in the process of heredity. Meanwhile, in order to generate new individuals, we improve the mutation operation of the traditional genetic algorithm in connection with the tree structure of the query tree. Through two new mutation operators, it ensures that the tree structure of chromosome string can still be preserved after mutation. Also the number of new individuals can be guaranteed. The new coding form makes up for a deficiency that crossover operator can not meet the needs of individuals' diversity.3. Due to the local response time and transport costs should be considered in the query process of heterogeneous database, we design the fitness function to make the minimum transport cost and query cost in the new algorithm. The fitness function results in virtual query. In other words, the individual represents query execution. It dynamically virtual simulate query process based on records about some factors in dynamic dictionary, and compute the value of query execution that the individual represents, ensuring that optimum solution resulted from the fitness function is authentic and valid.4. Finally, through experimental design, we prove that the query optimization algorithms of heterogeneous databases based on the feature of genetic algorithm is feasible. Based on the existing research results of genetic algorithm, we make authentication and selection toward some commonly used initial population size, crossover probability and mutation probability, and at last make a set of control parameters as the test standard of heterogeneous database query optimization. By repeatedly comparing the experimental results, we further adjust the relevant parameter settings and obtain optimization efficiency in the heterogeneous database query optimization.To sum up, after improving genetic algorithm, a new query optimization algorithm with higher performance can be received in the cross-database multi-table query process of heterogeneous database. We need to do further research and authentication about the dynamic selection on control parameters and the enactment on some important values, in the hope of getting new query optimization algorithm with higher performance through continuous improvement. |