Font Size: a A A

Semijoin Strategy-based Distributed Database Query Optimization Theory And Applications

Posted on:2009-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:F P LiFull Text:PDF
GTID:2208360278970828Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Compared to the great success of Centralized Database system, although many techniques and algorithms have been proposed to optimize queries of distributed database, DDBS (Distributed DataBase System) still has many problems to be resolved, such as how to select the best semi-join without getting temporary result, how to select the best optimized semi-join, and the algorithm complexity and space complexity is so great. These problems are explored in this paper.This paper introduces techniques of semi-join and join published in domestic and overseas, such as SDD-1 Optimization algorithm, Binary-Cut Reduction Algorithm and Genetic Algorithm. Then this paper represents the basic concept of DDBS, and the management processing, aim and total cost of distributed database query. The total cost is equal to CPU cost + I/O cost + communication cost. The operating process and the suitable for use of direct-join and semi-join are analyzed, and the paper gives emphasis to researching and comparing the communication cost of two joins. Direct-Join focuses more on local operation time and considers little about communication cost; while semi-join transmit temporary result instead of original data, which reduces the network communication cost efficiently. Therefore, the emphasis of this paper is query optimization algorithm based on semi-join.On the basis of understanding a great deal of join and semi-join algorithm, the paper selects three well-known algorithms to analyze: SDD-1 algorithm and PERF algorithm which deal with semi-join query optimization, Kruskal algorithm which deal with join query optimization, then this paper proposed a new algorithm named as KPSJ, which is a Krskal-PERF query optimization algorithm based on multi-relations semi-join. Then we proves that the complexity of KPSJ algorithm is smaller than that of other algorithms. KPSJ algorithm has little communication cost than most of semi-join algorithms, and the order of semi-join is the best when its benefit is correct. The KPSJ algorithm was applied to the purchase and sale management information system in the author's company. Through comparing communication cost and average of query time of Join algorithm, KPSJ algorithm and PAP algorithm, it is proved that the KPSJ algorithm reduces distinctly and efficiently the temporary result data number, the network communication total cost and query total cost.
Keywords/Search Tags:distributed database, multi-relations, semi-join, query Optimization
PDF Full Text Request
Related items