Meta-search and distributed search systems

Posted on:2003-04-16

Degree:Ph.D

Type:Thesis

University:Hong Kong University of Science and Technology (People's Republic of China)

Candidate:Shen, Yipeng

Full Text:PDF

GTID:2468390011981673

Subject:Computer Science

Abstract/Summary:

The Web, which has become one of the major information resources nowadays, contains billions of web pages, not including the huge amount of information encapsulated in web databases. Due to this huge and dynamic Web, it is difficult for a single search engine to index all of the web pages and yet keep its index database up-to-date. Meta-search and distributed search systems incorporating many single search engines can alleviate the problems associated with a single search engine.; A meta-search system is a middleware for a large number of underlying search engines. It receives queries from users and redirects them to the most relevant underlying search engines for processing. We propose two cluster based server selection methods (i.e., GSE and DE) in this thesis although they can also be adopted without clustering. The GSE server selection estimates each underlying server's relevance according to the general similarity of the clusters in the server. Only the most similar clusters are counted in the server selection since these clusters tend to contain the majority of the relevant documents in the server. The DE server selection estimates the distance of some optimal documents in the server. These optimal documents, which are believed to be inherently more relevant to the query, contain the query terms in a balanced manner and are also very close to the query point. Experimental results show that both GSE and DE can achieve very good and steady performance.; We also propose a peer-to-peer distributed search system where a large number of autonomous search engines are logically connected in a flat non-hierarchical architecture (network). We model the distributed search process as Markov Decision Processes (MDPs). The estimated relevance of a server to a query is regarded as the reward in the MDP model. Once the MDP policies representing the global knowledge are obtained at each server through the asynchronous value iteration, the most relevant servers to a given query can be efficiently identified despite the lack of centralized control and global knowledge at each autonomous server.; For our specific MDP model, we design a passive asynchronous MDP value iteration based on the propagation of reward messages. The properties of the reward propagation are studied and the value iteration algorithm is optimized correspondingly. In the optimized value iteration, the MDP rewards are propagated along the optimal directions and the unnecessary rewards are discarded (i.e., the reward selection and the propagation are combined). In order to adapt to the dynamic web environment, we also solve the problem of the revision of the MDP policies upon either content or structural adjustments. The effectiveness of our MDP model is verified by extensive experiments.

Keywords/Search Tags:

Search, MDP model, Web, Server, Value iteration

Related items

1	The Research And Design Of Approximation Methods For POMDP
2	Study On Big-Small Iteration Software Development Model And Define Method Of Iteration Scope
3	Based On J2ee Architecture, Eai Applications
4	The Research And Design Of A Personalizing Meta Search Engine Based On Server Side
5	Understanding the Impact of Support for Iteration on Code Searc
6	The Implementation And Research Of The Probabilistic Latent Semantic Analysis Model In The Search Engine's Business Text Classification System
7	Research On Point-based Value Iteration Algorithms In POMDP Domains
8	Station Based On The Index Server Search System Research And Applications
9	Establishment And Application Of The Common Server Database In NBOS
10	L₁-norm Minimization Method Based On Bregman Iteration And Its Applications