Font Size: a A A

Data Management In Structured Peer-to-Peer System Based On Gray Code

Posted on:2009-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:M Q ZhouFull Text:PDF
GTID:1118360272989278Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Peer-to-Peer(P2P in short) system is a self-organized distributed network system. Derived from file sharing,current research hot spots in P2P systems have been shifted to systemic resource sharing and distributed data management.Network applications are boosted immensely by such kind of research,since current network structure is broken through by P2P network systems which provided by a self-organized,highly distributed,scalable networks.Meanwhile,new challenges are brought into network applications under such kind of distributed network structures, such as providing complex query processing operators,estimating data density in the current system and so on.Much researches have been done and much goal has been achieved on resource sharing,simple data management in the P2P system environment,while lots of problems are still needed to be solved imperatively.The focus of this paper is to provide a new P2P system(called GChord),which is improved,optimized from one existing P2P overlay network(say Chord).GChord(Gray Code Based Chord) system provides well support to complex data management,since it combines the routing property of Chord and the excellency of data index based on shuffle-based Gray Code seamlessly.The complex data management supported by current GChord system can be listed as follows:data density estimation in the current network, one-dimensional(multi-dimensional) exact match query,one-dimensional(multidimensional) range query,preference query,multi-attribute(partial match) range query and so on.Moreover,new data management operators will be added to the new generation of GChord system,as our future research focus.The main contribution of this paper is to propose the new GChord system and its data management. The details are given as follows:●GChord system provides seamless combination between node management and data management.Inspired by Chord,GChord system out performance much over Chord.Since its seamless combination of node management and data management,complex data management operators can be supported by GChord while not by Chord.For node management,GChord has the same construction and maintenance as Chord.For data management,GChord provides Gray Code(any two adjacent Gray Codes differ in one-bit) based data index strategies,which in further results in the seamless combination of node management and data management.GChord system not only provides the complexity of O(logn) for node access in the system under its finger table, but also provides the complexity of O(logn) in data access according to its domain routing under its finger tables,where n is the number of nodes in the current system.Besides exact match query which provided by Chord,GChord system gives support to plenty of complex data management types.●GChord system gives support to the global data distribution estimation in the current system.It can benefit many P2P applications,such as load balancing analysis,query processing,data mining,and so on.In this paper,we propose a novel model named distribution-free data density estimation,which is based on distribution-free(i.e.,independent of data distributions) sampling on global cumulative distribution to achieve high estimation accuracy with low estimation cost regardless of distribution models of the underlying data.The idea of distribution-free estimation is deployed in GChord system by sampling a small subset of peers to estimate global data distribution over the data domain.Algorithms on computing and sampling global cumulative distribution based on which global data distribution is estimated are introduced with detailed theoretical analysis.Our extensive performance study confirms the effectiveness and efficiency of our methods under different data distributions in dynamic GChord networks.●GChord system gives support to preference query evaluation which returns top-k results according to users' specific interests has been well studied in centralized databases.However,it has never been discussed in the Peer-to-Peer (P2P) environment and remains imperative to be solved.In this paper, we propose a novel method for efficient preference query processing in GChord system.By effectively calculating a corresponding range which contains top-k results based on estimated data density in the current system using discrete cosine transformed histogram information,a preference query is transformed into a range query for efficient processing.Singular value decomposition of preference matrix is deployed to simplify the progress of range computation. Algorithms on density estimation and range computation are given as well as the theoretical analysis.Our extensive performance study confirms the effectiveness and efficiency of our methods under different data distributions in GChord system.●GChrod system gives support to multi-attribute query.Multi-attribute query is referred to as a query which only contains predicates on attributes that users are interest(i.e.subset of attributes contained in data tuples).That's to say,not only the number of attributes which have predicates in query could be arbitrary,but also the combination of attributes which have predicates in query could be arbitrary.That's why such kind of queries can't be supported well by existing P2P systems.However,GChord system gives well support to it,since it provides seamless combination between node management and data management,resulting in quick routing between data tuples according attribute domains.This paper proposes new algorithms on nodes computing which contains desired results based on Karnaugh map,and new algorithms on node accessing based on multi-cast tree constructed on-the-fly.Our extensive performance study confirms the effectiveness and efficiency of our methods for multi-attribute query in dynamic GChord networks.In a word,this paper gives a detailed discuss on GChord system and its functionality on data management,including overlay network and complex data management. GChord system achieves seamless combination between node management and data management.GChord system provides support to distribution-free data density estimation in the current network,preference query,multi-attribute(partial match) range query,one-dimensional exact match query(since the derivation from Chord system),multi-dimensional exact match and one-dimensional(multidimensional) range query(since the support for multi-attribute range query) and so on.For more applications to be supported in data management,we propose to give them with support in the next generation of GChord system.
Keywords/Search Tags:Peer-to-Peer, distribution-free, data density estimation, preference query, Gray Code, multi-attribute query
PDF Full Text Request
Related items