Font Size: a A A

Studies On Ranking Queries Processing And Classification Technology In P2P Environments

Posted on:2013-08-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J SunFull Text:PDF
GTID:1228330467979859Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology, network technology and database technology, storing and managing huge data are the important problems in the computer field. P2P is a new-style model of architecture, where each peer shares the data, storage and source of computation. Meanwhile, P2P has widely application prospect in the dominations of data management, search engine, data stream management and semantic web. In recently, P2P data management has become a problem of hotspots in database, the technology of uncertain data queries processing and data mining has especially become the core problem over the P2P data management. Most existing methods are mainly use centralized process mode, which can not adjust to the P2P network and other distributed environments. To solve those issues, in this dissertation, we focus on the problem of uncertain data queries processing and data mining in depth study.In this dissertation, considering the togology of P2P network, we propose some methods of related queries processing over uncertain data. Moreover, we study the problem of data classification and propose an OS-ELM based classification algorithm. Our works are listed in the following:For"top-k queries over uncertain data in structured P2P network":In this dissertation, we first give the definition of P2P top-&query, and then propos a novel P2P top-k query processing algorithm, for example, in the chord topology. Moreover, upper-bound based pruning strategy and corresponding improved strategy are proposed on the basis of locality-preserving hasing. Finally, extensive experiments are conducted to show the effectiveness and efficiency of the proposed algorithms.For"the index based uncertain ranked queries in unstructured P2P network": In this dissertation, we propose a novel approach of processing uncertain top-k queries in large-scale P2P networks, where datasets are horizontally partitioned over peers. In our approach, each peer constructs an Uncertain Quad-Tree (UQ-Tree) index for its local uncertain data, while the P2P network constructs a global index by summarizing the local indexes. Based on the global index, we propose a spatial-pruning algorithm to reduce communication costs and a distributed-pruning algorithm to reduce computation costs. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods in terms of communication costs and response time.In this dissertation, we propose a novel k nearest neighbor query processing method on uncertain data over P2P networks based on k nearest neighbor query processing method on uncertain data in centralized environment. This method is based on super-peer network topology, and adopts an extended R-tree index, called P2PR-tree, to index dataset in distributed database for solving multi-dimensional data index in the P2P environment. Using two pruning algorithms, we can reduce the number of candidate sets, and further reduce computation costs and network overhead of KNN queries. The experimental results are conducted to verify the high performance of our method on network costs.For"the probability based uncertain top-k queries in unstructured P2P network":In this dissertation, firstly, we construct a distributed index using Quad-tree and, based on the index, propose a spatial pruning algorithm. Secondly, we propose the upper bound of top-k probabilistic according to the relationship between local top-k probabilities and global top-k probabilities. We also propose the lower bound of top-k probabilities according to the relationship between skyline probabilities and top-k probabilities. Using the two probabilistic pruning algorithms, we can further reduce computation costs and network overhead of top-k queries, and further reduce the number of candidate sets. Finally, we develop a sampling algorithm to estimate top-k probabilities of candidates. Extensive experiments are conducted to verify the effectiveness and efficiency of the proposed methods.For"the online data classification in P2P network":We propose an OS-ELM based ensemble classification framework for distributed classification in a hierarchical P2P network. In the framework, we apply the incremental learning principle of OS-ELM to the hierarchical P2P network to generate an ensemble classifier. There are two kinds of implementation methods of the ensemble classifier in the P2P network, one-by-one ensemble classification and parallel ensemble classification. Furthermore, we propose a data space coverage based peer selection approach to reduce high the communication cost and large delay. Extensive experimental studies verify the effciency and effectiveness of the proposed algorithms.
Keywords/Search Tags:uncertain data, possible worlds, peer-to-peer, classification, top-k query, K nearest neighbor query
PDF Full Text Request
Related items