Font Size: a A A

Passage Retrieval System Based On Language Model

Posted on:2018-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:H W ZhangFull Text:PDF
GTID:2348330536484929Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Language Model(LM)is a new framework for Information Retrieval(IR),and the basic ideas behind the new approach is that IR system estimates a language model for each document and ranks documents by the likelihood of the query according to the language model.Researches carried out by a number of groups have confirmed that the language modeling approach is theoretically attractive and potentially very effective probabilistic framework for studying information retrieval problems.Although language model has been successfully applied to information retrieval,there is still room for improvement.In this paper we will try to extend the existing language model by the following points.First of all,we propose a new information retrieval model based on the classical language model——Query Likelihood Model(QLM),in which passage retrieval is applied namely PLM(Passage Language Model).The PLM not only inherits the advantages of the QLM that it has complete theory and retrieval effect is outstanding,but also takes the passage feture of document into account when retrievaling,thereby optimizing the effectiveness of the query likelihood model dealing with long documents especially those that summarizing many subjects.Moreover,we incorporate passage language model with a new Query Expansion(QE)method named HQE(Heuristic Query Expansion),so as to reduce its risk of the word mismatch.The HQE is a new method based on pseudo relevance feedback,which not only overcomes the disadvantage of relevance feedback,but also improve the effectiveness of the strategy used in QE.From the following experiments,we can see that the HQE is better than the classical query expansion method.Compared with the classical query expansion method,the MAP amplitude of the maximum lifting PLM model is 54.7%.Finally,we propose a new smoothing method based on the Dirichlet method,namely cluseter-based smoothing method,in which both the statistics information of document——the number of unique terms in the document and the clustering algorithms will be Integrated into PLM to help improve the accuracy of language model esitimation...
Keywords/Search Tags:language model, passage retrieval, query expansion, smoothing method
PDF Full Text Request
Related items