Font Size: a A A

Research On P2P Data Query Processing Based On Kademlia Network

Posted on:2014-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WangFull Text:PDF
GTID:2248330398460016Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The birth of the P2P computing was seen as a strong impetus for the development of data integration. The P2P data integration system combines the advantages of P2P technologies and data integration to overcome centralized solutions’shortcomings, and gives the users a barrier-free access to heterogeneous data to the largest extent.On the one hand, instead of using a huge and complex unified medium mode to realize data sharing, the P2P data integration system only builds and maintains the semantic mappings between the neighbor data sources, thus it eases and resolves certain problems of the traditional centralized data integration system. On the other hand, the P2P data integration system has some problems itself. Actually, the characteristics of P2P, such as scalability, decentralization and autonomy, can bring much new trouble to data integration. Here are the key points of them:how to build and maintain the semantic mappings among peers, how to organize and manage peers to realize data exchange, and how to meet the requirements of efficiency, high quality of fit and finish for query processing. So the main research content of this paper is to figure out an appropriate approach to make use of the superiorities of P2P and data integration.In the P2P network, Kademlia, as a widely used and efficient network protocol for P2P files sharing system, has a very clear logical structure, and with its identifying pattern of nodes and unique XOR metric for distance, it can provide θ (logn) lookup to locate the node closest to a given key. In this paper, we put forward a method of applying Kademlia to the P2P data integration system, and propose a new P2P data integration model, Dual-Kad, combing the Kademlia network over the Peer layer with that over the Super-Peer layer. It provides an effective framework to organize and manage peers, and regulates and controls the query routing, enhances the availability of data source, and as a result, improves the performance of P2P data integration.Firstly, with the assists of Kademlia over the Super-Peer layer, Dual-Kad can process queries based on semantic logic which is a limitation of the original Kademlia, and shorten the query routing path, cache the query results, and as a result, speed the whole query routing. Secondly, the Dual-Kad model uses the semantic mappings among peers, not the huge and complex relationship between data sources and the unified medium mode, and the former is smaller and more flexible, and adjusts the characteristics of scalability and dynamic nature, thus reduces the "information lost" during data exchange.Thirdly, we also do research on optimization of sub-query processing. We put forward an operator-centric data flow execution model. Through query reconstruction and sub-query delay strategies, we promote and exploit the data sharing opportunities, reduce communication overhead and therefore improve query performance.
Keywords/Search Tags:P2P Data Integration, Kademlia Network, Semantic Mapping, Query Reconstruction
PDF Full Text Request
Related items