Font Size: a A A

Research On Scientific Data Grid Distributed Query Architecture And Its Key Technologies

Posted on:2007-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C ZhouFull Text:PDF
GTID:1118360185454185Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
One of the major objectives of data grid research is to share the mass heterogeneous datain a distributed model. In such a loose coupled distributed system, it becomes a challengingproblem to query the mass distributed and heterogeneous data effectively without destroyingthe autonomy of data resource owners. To solve this problem, this dissertation put its emphasismainly on the research of the distributed query processing technologies used in the practicalScientific Data Grid environment.The main contributions of this dissertation are listed as follows:1. A distributed query architecture (SDGDQA). SDGDQA is suitable to the Scientific DataGrid environment, and can be divided into three layers: the user interface layer, the mediatorlayer and the query layer. The user interface layer is responsible for human-computerinteraction. It accepts query requests from users, processes these requests and displays theresults. At the same time, it also records and investigates the user's interests and habits. Thecore of SDGDQA is the mediator layer, whose functionalities include: globally optimization ofquery plans, location of the data resources, and auction and allocation of query plans. Biddingfor query plans and optimizing them locally are implemented in the query layer. In order toadapt to the autonomy and dynamic variability of grid nodes, the mobile agent technique isadopted in SDGDQA. This dissertation describes the migration and communicationmechanism between two mobile agents.2. A data resource location mechanism based on the small-world theory. This paperanalyzes the small-world network characteristic of the Scientific Data Grid before it isconstructed. This dissertation builds a hierarchical data resource location model with threelayers, i.e. virtual organization, institution and grid entity. Based on the Gossip protocol, amessage dissemination algorithm is presented. A data resource location algorithm is alsointroduced. The hierarchical data resource location model is suitable to the distributedcharacteristic of grid environment and is convenient for data resource joining and leaving,which results in a good scalability. By lowering the overload and enlarging the range ofmessage dissemination, the new message dissemination algorithm can update resourceinformation from other distributed nodes efficiently. The efficiency and performance oflocating data with the data resource location algorithm are highly increased by choosing nodespurposefully.3. A query plan allocation model based on market mechanism for SDGDQA. InSDGDQA, the query plan is regarded as a grid resource. According to the partial consistencycharacteristic between data grid and market mechanism, a market mechanism based query planallocation model is provided. This dissertation describes the architecture, the specific economicmodel, and several protocols and algorithms of this model, and analyzes the benefits of theauctioneer and bidder. Analytical results show that this model has several advantages asfollows: (a) It is user-centered;(b) The query plan can be accessed by the query resourceproviders fairly;(c) The query resource providers are encouraged to contribute their idleresources, which can adjust the equilibrium of supply and demand and build the large scalegrid resource allocation system;(d) The query plan providers are allowed to express theirrequirements and objectives;(e) According to their own conditions, the query resourceproviders and the query plan providers can make their decisions and maximize their owninterests.4. A query optimization algorithm based on genetic programming for SDGDQA. Thequery optimization algorithm based on genetic programming is presented. The query plan isregarded as a query tree in this algorithm, there for it has more flexible expression forms. Allelements in the searching space are LBT trees. The numbers of the elements in the searchingspace with LBT trees are decreased greatly than with GBT trees. In order to punish the LBTtree with Cartesian product, the penalty function method is adopted. The homologous cross isused for cross operator, thus the context can be considered. Experimental results indicate thatthe algorithm has advantage when it has more relations in the query plan. Finally, thedissertation analyses the algorithm convergence using Markov chain theory, and it is provedthat the algorithm with the best individual preservation strategy is convergent.
Keywords/Search Tags:Scientific Data Grid, Distributed Query, Mobile Agent, the Small-World, Market Mechanism, Genetic Programming
PDF Full Text Request
Related items