Research On P2P Data Query Processing Based On Kademlia Network

Posted on:2014-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Wang

Full Text:PDF

GTID:2248330398460016

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The birth of the P2P computing was seen as a strong impetus for the development of data integration. The P2P data integration system combines the advantages of P2P technologies and data integration to overcome centralized solutionsâ€™shortcomings, and gives the users a barrier-free access to heterogeneous data to the largest extent.On the one hand, instead of using a huge and complex unified medium mode to realize data sharing, the P2P data integration system only builds and maintains the semantic mappings between the neighbor data sources, thus it eases and resolves certain problems of the traditional centralized data integration system. On the other hand, the P2P data integration system has some problems itself. Actually, the characteristics of P2P, such as scalability, decentralization and autonomy, can bring much new trouble to data integration. Here are the key points of them:how to build and maintain the semantic mappings among peers, how to organize and manage peers to realize data exchange, and how to meet the requirements of efficiency, high quality of fit and finish for query processing. So the main research content of this paper is to figure out an appropriate approach to make use of the superiorities of P2P and data integration.In the P2P network, Kademlia, as a widely used and efficient network protocol for P2P files sharing system, has a very clear logical structure, and with its identifying pattern of nodes and unique XOR metric for distance, it can provide Î¸ (logn) lookup to locate the node closest to a given key. In this paper, we put forward a method of applying Kademlia to the P2P data integration system, and propose a new P2P data integration model, Dual-Kad, combing the Kademlia network over the Peer layer with that over the Super-Peer layer. It provides an effective framework to organize and manage peers, and regulates and controls the query routing, enhances the availability of data source, and as a result, improves the performance of P2P data integration.Firstly, with the assists of Kademlia over the Super-Peer layer, Dual-Kad can process queries based on semantic logic which is a limitation of the original Kademlia, and shorten the query routing path, cache the query results, and as a result, speed the whole query routing. Secondly, the Dual-Kad model uses the semantic mappings among peers, not the huge and complex relationship between data sources and the unified medium mode, and the former is smaller and more flexible, and adjusts the characteristics of scalability and dynamic nature, thus reduces the "information lost" during data exchange.Thirdly, we also do research on optimization of sub-query processing. We put forward an operator-centric data flow execution model. Through query reconstruction and sub-query delay strategies, we promote and exploit the data sharing opportunities, reduce communication overhead and therefore improve query performance.

Keywords/Search Tags:

P2P Data Integration, Kademlia Network, Semantic Mapping, Query Reconstruction

PDF Full Text Request

Related items

1	Research On P2P Resource Locating Based On Kademlia Network
2	Research On Data Integration Based On Ontology Technology
3	Research On Data Integration Based On Ontology Technology
4	Query Processing And Optimization In Heterogeneous Information Integration
5	A Semantic Description And Data Query Oriented Big Data Organization Method And Researches On Key Application Technologies
6	Research On Building Mapping In Semantic Correctness Oriented Integrated Data Access
7	Based On The Body Of Heterogeneous Data Source Integration System Model And Its Query Processing
8	Research And Application Of Multisource Data Intergration Based On Ontogy
9	The Data Access And Integration Based On P2P Method Under Grid
10	Research And Implementation Of Ontology-based Heterogeneous Data Integration System