Font Size: a A A

Distributed indexing and aggregation techniques for peer-to-peer and grid computing

Posted on:2007-01-10Degree:Ph.DType:Dissertation
University:University of Southern CaliforniaCandidate:Cai, MinFull Text:PDF
GTID:1448390005476945Subject:Computer Science
Abstract/Summary:
Peer-to-Peer (P2P) systems and Grids are emerging as two novel paradigms of distributed computing for wide-area resource sharing on the Internet. In these two paradigms, it is essential to discover resources by their attributes and to acquire global information in a fully decentralized fashion. This dissertation proposes a multi-attribute addressable network (MAAN) for resource indexing, a distributed aggregation tree (DAT) for information aggregation, and a distributed counting scheme for estimating global cardinalities.;MAAN indexes resources by their attribute values on Chord via uniform locality preserving hashing. It resolves multi-attribute range queries with a single-attribute dominated algorithm that scales to both the network size and the number of attributes. The DAT scheme implicitly builds an aggregation tree from Chord routing paths without membership maintenance. It also improves the Chord routing algorithm to build a balanced DAT tree, when nodes are evenly distributed in the identifier space. Furthermore, the distributed counting scheme digests large sets into small cardinality summaries and merges them through a DAT tree. The global cardinality is estimated by using an adaptive counting algorithm that not only scales to large cardinalities but also scales to small ones.;Based on these techniques, this dissertation has developed three applications including a P2P replica location service (P-RLS), a distributed RDF repository (RDFPeers) and a worm signature generation system (WormShield). P-RLS extends the Globus RLS system with properties of self-organization, fault-tolerance and improved scalability. It also reduces query hotspots by using a predecessor replication scheme. RDFPeers stores, searches and subscribes to RDF triples in a MAAN network. In RDFPeers, the routing hops for triple insertion, for most query resolution and for triple subscription are logarithmic to the network size.;WormShield automatically generates worm signatures by using distributed fingerprint filtering and aggregation at multiple edge networks. Due to Zipf-like fingerprint distribution, the filtering scheme reduces the amount of aggregation traffic by several orders of magnitude. The global fingerprint statistics are computed through DAT trees in a scalable and load-balanced fashion. The experimental results demonstrate that the indexing and aggregation schemes perform well in different P2P and Grid applications.
Keywords/Search Tags:Distributed, Aggregation, P2P, Indexing, DAT, Scheme
Related items