Font Size: a A A

Using contextual information and machine learning technique to improve retrieval performance

Posted on:2007-10-13Degree:M.ScType:Thesis
University:York University (Canada)Candidate:Huang, Yan RuiFull Text:PDF
GTID:2448390005967266Subject:Computer Science
Abstract/Summary:
Information Retrieval (IR) refers to finding information from large amounts of data, and is among the most useful technologies to overcome information overload. Given a user's inquiry, how can the information retrieval system return the documents which are most likely relevant? This thesis presents the use of contextual information and applying machine learning methods to pseudo-relevance feedback to improve retrieval accuracy.; Traditional IR systems focus on the content of the search, but have ignored user needs. However, different users may have different information needs. They may use the same query to search for different kinds of information. Therefore, to obtain an optimal search result, user and contextual information must be considered.; In this thesis, we present a contextual retrieval framework which incorporates the user and global contextual information into the probabilistic retrieval model. We investigate how we can exploit contextual information to improve information retrieval performance in detail. In particular; (1) we use the related text contextual information for query expansion; (2) we use the granularity information to develop a dual index model, which constructs both the document level index and paragraph level index; (3) we use the geographic information for filtering. In addition, a new term weighting function BM50 is proposed based on the global contextual information. This framework is adaptable and extendable.; In this thesis, we also propose the idea of using machine learning methods to boost the performance of pseudo-relevance feedback. Pseudo-relevance feedback is a technique commonly used in IR to improve retrieval performance. Its basic idea is to extract expansion terms from the top-ranked documents from the initial retrieval to formulate a new query for a second round retrieval. The effect of pseudo-relevance feedback strongly relies on the quality of the selected expansion terms from the top-ranked documents. One way to improve the performance is to improve the quality of chosen documents for query expansion. Here, we present the use of machine learning on top of pseudo-relevance feedback to choose the relevant documents. In particular, we incorporate a co-training algorithm into the retrieval system at the feedback stage.; Extensive experiments have been conducted. The experimental results show that both approaches are effective on improving the retrieval performance.
Keywords/Search Tags:Retrieval, Information, Machine learning, Pseudo-relevance feedback
Related items