Font Size: a A A

Combination of multiple Web search results and its effect on the search performance

Posted on:2001-05-30Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Dong, JianhuaFull Text:PDF
GTID:2468390014955285Subject:Information Science
Abstract/Summary:
Discriminating relevant documents from all other documents in the Web information space is a challenging problem. The proliferation of differentiated Web search engines makes it difficult for Web users to select appropriate search engines. In addition, a search engine typically indexes only a subset of all documents available on the Web, which makes it unable to produce search results as satisfactory as those achieved by traditional information retrieval (IR) schemes.; This dissertation study investigates the effects of applying multiple evidence combination techniques on 30 questions submitted to four popular Web search engines of general purpose: Excite, Hotbot, Infoseek, and Lycos. The 30 queries are organized into three groups: simple queries, moderate queries, and complex queries.; Four hypotheses are investigated in this study. Hypothesis one states that documents appearing in more than one original result tend to be more relevant. Hypotheses two to four are designed, for three measures adopted (first twenty precision, average precision over all relevant documents, and precision averaged over 11 standard recall levels), to test if the average performance obtained at a higher level of combination is significantly better than the performance achieved at levels below it. The whole experimental process consists of five phases: (1) Collecting data; (2) Post-processing data; (3) Evaluating results; (4) Combining results; (5) Analyzing the combined results.; The study shows that hypothesis one is supported when the value of relevancy threshold is set to be documents that are at least partially relevant, that hypothesis two is not supported, and that hypotheses three and four are not supported in most cases, but supported when the original results are involved in the comparisons. Some other findings are: complex queries are more likely to retrieve documents from more search engines; more than 15% of retrieved Web links are inactive for various reasons; on the average combining more evidence sources does improve Web IR performance. The study shows that the best result of improvement is achieved at the 2-way level, combining the ranked output of two search engines.
Keywords/Search Tags:Search, Web, Results, Documents, Performance, Combination, Relevant
Related items