Using contextual information and machine learning technique to improve retrieval performance

Posted on:2007-10-13

Degree:M.Sc

Type:Thesis

University:York University (Canada)

Candidate:Huang, Yan Rui

Full Text:PDF

GTID:2448390005967266

Subject:Computer Science

Abstract/Summary:

Information Retrieval (IR) refers to finding information from large amounts of data, and is among the most useful technologies to overcome information overload. Given a user's inquiry, how can the information retrieval system return the documents which are most likely relevant? This thesis presents the use of contextual information and applying machine learning methods to pseudo-relevance feedback to improve retrieval accuracy.; Traditional IR systems focus on the content of the search, but have ignored user needs. However, different users may have different information needs. They may use the same query to search for different kinds of information. Therefore, to obtain an optimal search result, user and contextual information must be considered.; In this thesis, we present a contextual retrieval framework which incorporates the user and global contextual information into the probabilistic retrieval model. We investigate how we can exploit contextual information to improve information retrieval performance in detail. In particular; (1) we use the related text contextual information for query expansion; (2) we use the granularity information to develop a dual index model, which constructs both the document level index and paragraph level index; (3) we use the geographic information for filtering. In addition, a new term weighting function BM50 is proposed based on the global contextual information. This framework is adaptable and extendable.; In this thesis, we also propose the idea of using machine learning methods to boost the performance of pseudo-relevance feedback. Pseudo-relevance feedback is a technique commonly used in IR to improve retrieval performance. Its basic idea is to extract expansion terms from the top-ranked documents from the initial retrieval to formulate a new query for a second round retrieval. The effect of pseudo-relevance feedback strongly relies on the quality of the selected expansion terms from the top-ranked documents. One way to improve the performance is to improve the quality of chosen documents for query expansion. Here, we present the use of machine learning on top of pseudo-relevance feedback to choose the relevant documents. In particular, we incorporate a co-training algorithm into the retrieval system at the feedback stage.; Extensive experiments have been conducted. The experimental results show that both approaches are effective on improving the retrieval performance.

Keywords/Search Tags:

Retrieval, Information, Machine learning, Pseudo-relevance feedback

Related items

1	Application Of Ties, Transfer Learning And Pseudo Feedback On Learning To Rank
2	The Research Of Machine Learning Techniques And External Web Resources For Relevance Feedback
3	Cross Language Information Retrieval Based On Topical Pseudo Relevance Feedback
4	Research On Pre-trained BERT Based Pseudo-relevance Feedback Method
5	A Study Of Collection-based Features For Adapting The Balance Parameter In Pseudo Relevance Feedback
6	Research On Pseudo Relevance Feedback Based On Document Similarity
7	Research On The Relevance Feedback Based On Log Learning For Image Retrieval
8	Image Retrieval Based On Relevance Feedback
9	Research On Pseudo Relevance Feedback Query Expansion Technology Based On Latent Semantic Relation
10	The Research On Query Understanding And Positive-Negative Relevance Feedback Approaches