Font Size: a A A

Research On Models And Algorithms Of Distributed Cooperative Search Engine

Posted on:2012-07-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z D LiuFull Text:PDF
GTID:1118330362467965Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
The rapid growth of World Wide Web has become a big challenge to traditionalcentralized search engines which lack of scalability, coverage rate, freshness,professionalization, individuation, and diversification. The rise of Cloud Computing,along with the service-oriented concept, gives birth to a lot of innovations in the Internet,which is also a good chance to reform the architecture of traditional search engines. Andthese lead to the promising distributed search engines which are suitable for the nextgeneration Internet.This thesis proposes the model of assignable computations to describe distributedsearch engines. Within this model, the computing power is regarded as a kind of fluidmaterial, which can be transferred from suppliers to demanders via the network. Andboth distributed and centralized search engines can be explained as the equilibrium ofassignable computations. This thesis points out that distributed search engines take fulladvantage of computing resources in websites which are referred to as nodes in thisthesis.This thesis also compares the two types of distributed search engines, thecooperative type and the uncooperative type. As cooperative search engines have higherprecision and less computing cost, this thesis mainly researches on distributedcooperative search engines.Assigning more computations to nodes is essential for high scalability, but thistends to lower the efficiency of online queries due to unavoidable real-timecommunications with nodes. So the thesis proposes the strategy of dual retrieval whichcontains the node-based retrieval hosted by the center and webpage-based retrievalhosted by nodes.The thesis designs the dual retrieval architecture of distributed cooperative searchengines. To raise the utility of computations, the thesis proposes a2-tier DirectedMapReduce algorithm for computation assignment, which considers both updatingindex descriptions of nodes and accomplishing queries of users at the same time.Simulation results show that the proposed2-tier Directed MapReduce algorithm,comparing with other algorithms, effectively reduces the communication cost of online retrievals as well as the average load of servers. Though a little load imbalance and akind of retrogradation are possible, the proposed algorithm highly improvesperformances of distributed cooperative search engines in general.Furthermore, the proposed architecture and algorithm of distributed cooperativesearch engines are carried out into a practical prototype system, which is calledConfederation Search Engine. The Confederation is made up of several nodes withindependent search engines and applies SOAP for communication. Each node, even aheterogeneous one, can easily and seamlessly join the SOA-based Confederation byaccepting the proposed protocol and publishing Web Services in the form of WSDLdocuments.
Keywords/Search Tags:Distributed Cooperative Search Engine, Distributed InformationRetrieval, Assignable Computation Model, Dual Retrieval, 2-tier DirectedMapReduce
PDF Full Text Request
Related items