Font Size: a A A

Research On Information Retrieval Models Based On Statistical Language Model And Passage Feature

Posted on:2008-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:K DangFull Text:PDF
GTID:2178360245997693Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The Information Retrieval(IR) model is the abstract description of the IR task and implementation method. IR model is the core content of the IR research, so the research regarding IR model is of great theoretical and practical significance. In addition, since the statistical language model was applied to the IR area, it has been regarded as a very good IR framework and been widely researched. Passage is an effective linguistic feature utilized in the IR area. The main research objects of this paper are the IR models which are based on the statistical language model framework and passage feature.Specifically, the main content of this paper is as below:1. This paper reviews the classical IR models and the models extended from them. This paper also analyzes the statistical language model which is used in the IR area and its smoothing methods. Moreover, this paper discusses the classification of passages. Then this paper proposed a new IR model—PJM. by extending the Jelinek-Mercer smoothing, the new model successfully incorporates the passage feature into language model framework. The experiments based on the TREC collections prove that the performance of the new model has significant improvement over the simple language model. In order to do experiments on the new model, this paper introduces the platform of the experiments: Lemur.2. This paper further investigates how to utilize the passage feature in the statistic language model framework. This paper analyzes two directions of the passage feature research: the research about the forms of the passages and the research about how to utilize the passage. Then, this paper summarizes the methods of the two directions, by which this paper establishes the foundation for the future research. Then this paper compares the PJM model with the method proposed by the other researchers. The experiments demonstrate that in the statistical language model framework compared with only using passage-level information combining the passage-level and document-level information can improve the model performance.3. By combining the different smoothing technologies (Jelinek-Mercer and Dirchlet), this paper extends the PJM model to three new IR models. The results of experiments show that the new models have outperformed the simple language model significantly and that the performance of the new models is comparable to the PJM model.
Keywords/Search Tags:information retrieval model, statistical language model, passage, smoothing
PDF Full Text Request
Related items