Font Size: a A A

The Research Of Complex Query Processing Based On Schema-Matching In P2P Network

Posted on:2008-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y T LiuFull Text:PDF
GTID:2178360245997872Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Peer-to-Peer(P2P) systems attempt to provide a decentralized infrastructure for resource sharing.With the development of P2P systems, the application of P2P has evolved from file-sharing to database sharing, sharing of relational data in heterogeneous P2P database systems is a challenging problem, especially in the absence of a global schema. Data management in P2P systems is quite challenging because of the scale of the network and the autonomy and unreliable nature of peers. Most of work on sharing semantically rich data in P2P systems has focused on schema management and query processing and optimization. Many systems about data management have already been developed, but these systems are lack of complex queries processing in the absence of a global schema.This paper describles a system called P2P database system (P2PDBS) that allows users in a P2P network to share their databases. To permit fine-grained sharing of heterogeneous databases, P2PDBS takes the unique approach of integrating super-peer topologies with semi-automated schema matching techniques. Our goal—apply database management system in P2P systems,it allow P2P users to manipulate data in relational data style in the absence of a global schema and schema mapping between peers'schemas can be defined automatically. Super-peer mediated mappings are implemented in P2PDBS, they define mediated schemas at the super-peer level and pair wise mappings between super-peers. This allows P2PDBS to benefit from the advantages offered by both approached. The system uses super-peer topology to break the P2P network up into communities, Each community has a super-peer in charge of it.To be effective, a query-routing strategy should forward queries only to peers who are likely to match the queries. In order to reduce network flooding, semantic overlay network (SONs) is built based on super-peer topology. By the Vector Space Model of Information retrievel, similarity between communities can be Computed. The system presents a self-configuration method of semantic overlay networks which makes use of transitive nature of schema mapping, the method makes the most similarly semantic communities to become neighbours. P2PDBS using semantic routing indexes to improve the performance of query routing.We present tree-based aggregation query processing, and use in-network aggregation to reduce network traffic. Within the community, the architecture of super-peer is used to process aggregation query. We discuss various generic properties of aggregates, and show by caching the value of aggregation functions in mediated schema can improve the response time of query. Several query optimization techniques such as ranking and pre-computation are proposed for aggregation with MAX-MIN function.We study the problem of Join query processing in P2PDBS, and concentrate on the case of equi-join queries. The concept of P2P-Join is proposed, which is a join operation to combine tuples among relations from different peers contain attributes in the query.Within the community, the semi-join is used to process join query. Between communities, we present a join execution method called virtual join that reduce the communication cost and local processing cost.Finally, we evaluate the behavior of P2PDBS using the simulator Netlogo, experimental results show that SONs can significantly improve query performance and the complex queries are processed efficiently in such P2P networks.
Keywords/Search Tags:P2P database system, schema matching, semantic overlay network, aggregation query, Join query
PDF Full Text Request
Related items