Font Size: a A A

Research On The Key Technologies Of Information Retrieval In Super-Peer Networks

Posted on:2013-05-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H TanFull Text:PDF
GTID:1228330395485251Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The peer-to-peer (P2P) networks have been playing an important role in thefields of information retrieving and file sharing in view of the characteristics of thedistributed data storage, the equivalence of nodes and the direct communicationbetween nodes, and so on. Along with the growing network scale is larger and larger,the network bottleneck and node failure problems are more likely to arise. Becausethe super node network has potential advantage such as high search efficiency, rapidpositioning, fault-tolerant, scalable and so on, In order to solve the problems, peoplebased on the Super-Peer networks information retrieval start to make a preliminarystudy. However existing researches results have still some shortcomings, there arestill many key problems need to be solved. Such as, how to construct a Super-Peernetwork according to the semantic similarity principle to further improve the searchefficiency and the quality of search; how to design super-peers topology structurewith constant degree to further reduce network maintenance cost; how to designfailure node, link failure detection and recovery mechanism to improve the networkfault tolerance; how to establish mechanism of distributed query to reduce the supernodes load, improve the quality of retrieval results and results downloading. Solvingthese key problems will be helpful for users to meet the needs of information retrieval,to provide users with more convenient and efficient information retrieval platform.Therefore, there are important theory values and the practical significances toresearch network information retrieval in Super-Peer networks.This paper aims at improving the searching efficiency, the quality of search anddownload quality, reducing network maintenance cost, enhancing network faulttolerance as well, which research the key problems, utilize the study results designand implementation of prototype system. The aspects of main research contents andcontributions include as following:(1) In the aspects of the Super-Peer networks construction, earlier Super-Peernetworks have some defects, such as the connections between super-peers andclient-peers lack of semantic relevance, the search efficiency and the quality ofretrieval results are not high, and it does not deal with the super-peer load balance. Anovel method of constructing super-peers networks based on online cluster isproposed. The main research contents include: Firstly, an improved online cluster algorithm is proposed to be applied to high dynamically peer-to-peer networks. Thisalgorithm ensures every super-peer having strong semantic relativity with theclient-peers joined, thus provides a good solution to the problem of accurate locatingresources. Secondly, the method adopts an adaptive dynamically adjustment strategyfor choosing super-peers when client peers connect the super-peer. With the strategy,super-peers can dynamically adjust the number of client peers according to itsoverlaod capacity. It makes simply for selecting super-peer, and solves networkbottle-neck problem caused by overloading super-peer with the networks sizeenlarging. Thirdly, the theme also raises an optimized searching mechanism toimprove search efficiency and retrieval result.(2) In the aspects of organization and management of shared documents, currentmethods of organizing shared documents are not effective to construct semanticoverlay networks. In order to overcome the drawbacks, this theme proposes atechnique called Super-Peer Network based on Hierarchical Cluster Trees. The mainresearch contents include: Firstly, a new algorithm is proposed to arrange sharedocuments in peers into hierarchical cluster tree structures according to clustersemantic features. This tree structure is logical in term of arranging clusters, and therelationship between clusters is more clearly reflected compared to the one-level, flatstructure. Secondly, for a hierarchical cluster tree, a method is proposed based onmultinomial simulation technique. By this method, clusters can be dynamicallygenerated or merged according the semantic feature of related clusters and documents,rather than having a fixed threshold which was used in earlier methods. Thirdly, aconstruction method for Super-Peer network based on Hierarchical Cluster Trees hasbeen developed, so that different level clusters at client peers can establish links tothe clusters trees of super peers of semantic overlay networks according to cluster anddocument feature similarities, to improve the search efficiency and decrease networkbandwidth cost.(3) In the aspects of super-peers topology structures, existing networks topologystructures have some defects, such as complex maintenance and consuming anexcessive amount of bandwidth. Two constant degree super-peers topology structuresare proposed. The main research contents include: Firstly, a new Super-Peer networkis proposed by analyzing the characters of perfect difference graph (PDG), a novelk-PDG structure and the corresponding super-peer overlay topology construction andmaintenance approaches are proposed. Secondly, a novel k-Petersen graph is proposedby analyzing the characteristics and the deficiencies of Petersen graph, a new Super-Peer network based on k-Petersen graph. Compared with existed super-peertopologies, the new super-peers toplogies scheme support approximate query, andreduce the bandwidth consumption during search and decrease the cost of topologyconstruction and maintenance.(4) In the aspects of distributed query, for existed super-peers networks havesome faults such as retrieval result global ranked, duplicated and invalid downloadpeers, a mechanism of low overload and high quality distributed is proposed. Themain research contents include: Firstly, a method of building data index and selectingquery-peer in a super-peer is presented to decrease super-peer burden. Secondly, adistributed ranking approach based on global information is addressed, to solveretrieval results rank. Thirdly, a strategy of selecting downloading peers forduplicated results to reduce retrieval overlap, network transfers cost and responsetime.(5) A prototype system of information retrieval in Super-Peers network isdesigned and implemented. Researches presented in this paper are realized in theprototype system, such as the method of organizing shared documents, the algorithmto construct super-peer networks, the mechanism of searching and sorting retrievalresults, and so on.
Keywords/Search Tags:Super-peer Network, Information Retrieval, Topology Structure, Network Fault Tolerance, Shared Documents Organization
PDF Full Text Request
Related items