Font Size: a A A

Combining text-, link-, and classification-based retrieval methods to enhance information discovery on the Web

Posted on:2003-09-21Degree:Ph.DType:Dissertation
University:The University of North Carolina at Chapel HillCandidate:Yang, KidukFull Text:PDF
GTID:1468390011989118Subject:Information Science
Abstract/Summary:
The massive, heterogeneous, and dynamic Web document collection diminishes the effectiveness of retrieval approaches of traditional Information Retrieval (IR). At the same time, the Web is rich with various sources of information that go beyond the contents of documents, such as hyperlinks and Web directories (e.g. Yahoo). This dissertation extends the past fusion IR studies, which have repeatedly shown that combining multiple sources of evidence (i.e. fusion) can improve retrieval performance, by investigating the effects of combining three distinct retrieval approaches for Web IR: the text-based approach that leverages document texts, the link-based approach that leverages hyperlinks, and the classification-based approach that leverages Yahoo categories.; The retrieval results of text-, link-, and classification-based methods were combined using variations of the linear combination formula to produce the fusion results, which were compared to individual retrieval results using traditional retrieval evaluation metrics. Fusion results were also examined to ascertain the significance of overlap (i.e. the number of systems that retrieve a document) in fusion. Although the performance of the best fusion result was only marginally better than the best individual result, the analysis of overlap strongly suggested that the solution spaces of text-, link-, and classification-based retrieval methods were diverse enough for fusion to be beneficial. Furthermore, analysis of the results revealed much insight on the important characteristics of the fusion environment, such as the effects of system parameters and the relationship between overlap, document ranking and relevance.; The main contribution of this dissertation lies in its confirmation of the viability of fusion for Web IR by not only determining the existence of the fusion potential in the combined solution spaces of text-, link-, and classification-based retrieval methods but also by demonstrating that relatively simple implementation of fusion does improve the retrieval performance.
Keywords/Search Tags:Retrieval, Web, Link-, Fusion, Information, Text-, Approach that leverages, Combining
Related items