Font Size: a A A

Research On Data Query Processing And Optimization In Distributed Database

Posted on:2006-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:H GongFull Text:PDF
GTID:2168360155472246Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Distributed database system is a new technology as a result of which computernetwork is incorporated into database system.The research on distributed databasesystem is mainly focused on how to allocate data fragments and process them just viaa network. With regard to query processing, if in a distributed database managementenvironment, the relations involved in a distributed query may be fragmented and/orreplicated, thereby the overhead cost is not only expressed as a weighted combinationof I/O and CPU, but also including communication costs.Since the critical cost factors are related to the execution of joins reqired in adistributed query, researchers both domestic and overseas have been doing research onoptimization of relation joins and so designed various kinds of methods. And of allthese methods, one method optimizes equijoin queries in diatributed database whererelations are horizontally partitioned using a hash function and has been widelyapplied now. According to a value of hash function, every relation is divided intodistincted horizontal fragmentation after once hash partitioned on it and stored inmany different sites. Thus, different relation all keeps site-relied when joining eachother. On the other hand, there commonly exists repartitioning problem if amultiplicity of relations performs joins. If communication among nodes is slow, thenmost time will be spent in routing data in a network when repartitionings occur. Evenif communication is fast, repartitioning may reduce throughput because it entails moredisk accesses. Although a large amount of researchers have proposed many costmodels and algorithms designed to optimize distributed database queries, some of theirmodels and algorithms are strictly restricted to a certain specific queries, the othernearly do not get a satisfactory order of joins among relations on condition that thequeries are on a large scale.This paper will lucubrate optimization of queries in distributed database whererelations are hash partitioned and bring forward a new heuristic algorithm—a hashpartitioning equijoin optimization heuristic algorithm based on dividing query graph.The properties of this algorithm are as follows.①It introduces the concepts of boundary point and query block respectively.②It introduces two criterions that how to judge a boundary point.③All cost of relation join operations are calculated by cost model based on hashpartitioning. ④The algorithm applies the idea of backtrack. ⑤It takes advantage of Kruskal heuristic algorithm and CHAIN algorithm to optimize some specific query blocks. ⑥Using the optimal result yielded by this algorithm, many join operations can be executed concurrently. This algorithm using the concept of backtrack can divide the query graph intomany query blocks by making a depth-first search for the query graph. Then it willdetermine an effective sequence of join operations for some chosen query blocks bytaking advantage of Kruskal heuristic algorithm. When a round depth-first search isterminated, the algorithm will reconstruct a new query graph by merging all queryblocks divided from former query graph. The next step this algorithm will take is torecursively make a depth-first search for the already rebuilt query graph until thequery graph can not be further divided or all query blocks are unexceptionallycomposed of only one edge. Finally, the algorithm will use CHAIN algorithm togenerate the optimal join sequence if query graph is a chain, otherwise the joinsequence is produced by making use of Kruskal heuristic algorithm. In the end of this paper, the algorithm is realized by experiment. The result of theexperiment is shown that the cost of join operations reqired for distributed queries ismore cheaper in contrast with the traditional method, such as Kruskal heuristicalgorithm.
Keywords/Search Tags:Distributed Database, Query Optimization, Equijoin, Hash Partition, Cost Formulas, Query Graph, Redundant Clause, Boundary Point, Query Block
PDF Full Text Request
Related items