Font Size: a A A

Combining Vector Space Model And Language Model To Information Retrieval

Posted on:2007-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y H YangFull Text:PDF
GTID:2178360212980083Subject:Computer applications
Abstract/Summary:PDF Full Text Request
Information retrieval (IR) systems are based, either directly or indirectly on models of the retrieval process. These retrieval models specify how representations of documents and query should be compared in order to estimate the relevant likelihood.As these retrieval models were being developed, quite early in the experiment it was observed that different retrieval models had surprisingly low overlap in the relevant documents that were found. So the combination of different text representations and search strategies has become a standard technique for improving the effectiveness of information retrieval.This combination approach to IR can be modeled as combining the output of classifiers. In this framework this model specifies that the best results will be achieved when the classifiers produce good probability estimates and are independent.Vector space mode (VSM) is a classic retrieval model in information retrieval area. Since it has been introduced in 1958, VSM always has a good performance in retrieval area. Statistical language model is a novel retrieval mode developed in recent years, which thinks about retrieval in a new way. Combining these two models into the same system should be expected to obtain much performance improvement..The independence of indexed terms is one of the basic assumptions in VSM, which leads to the lack of word-ordering information in this model. But the relative ordering of words is informative in almost all applications. High level n-gram language model includes the word-ordering information in some degree. So it is an optimal combination strategy to combine high level n-gram model with VSM.In this paper we design and implement an information retrieval system which combines the VSM and Bigram language model. A simple linear combination method is adopted to combine ranking algorithms. For the simplicity in computation and performance improvement, the system is realized in a two-stage way with language model as reranking used in the second stage.Through experiments in TREC document set, the result shows, compared to VSM and language model, the combination approach achieves higher precision in all levels'recall and the mean 11-AVG precision also has a significant improvement.
Keywords/Search Tags:information retrieval, vector space model, language model, n-gram model, combination model
PDF Full Text Request
Related items