Font Size: a A A

Semantic Positional Language Retrieval Models With A Proximity Information

Posted on:2015-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:X L GongFull Text:PDF
GTID:2298330431498600Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the past few decades, there have been many classic models in the field ofinformation retrieval, such as the Boolean model, Vector space model andProbabilistic model. In1998, Ponte and Croft first proposed and applied Statisticallanguage model into information retrieval field, and also proposed the model namedquery likelihood language model that has gained rapid development in recent years.Consequently, many scholars have joined in this field of research. Based on the greatnumber of experiments, Hidden markov models, Statistical Translation models, therisk minimization for information have been proposed by researcher in turn.But most of the retrieval models have been proposed by the researcher whichbased on the frequency of words in the document, and do not consider the positionrelationship of the word in the document. Base on the issue above, Lv and Zhaiproposed positional language models, The biggest advantage of this model is that ithas considered the positional relationship of words in document, but this model alsohave some defects. And then,Yu and Wang made some advantages, propose a newmodel which named positional language models with semantic information. Andapplied it to information retrieval successfully. The retrieval section of this modelsused application by interpolation smooth method (Jelinek-Mercer) directly, also donot consider the position relationship of the query terms in the document. Therefore,this paper will do some works base on their work foundations. Recent studies showthat using match query terms’ position information in the documents can promote theprecision of the query results. How to better express the position information ofquery terms in a document and modeling is one of the problems about improvingretrieval efficiency. This paper studies takes a further consideration about termsproximity information on the basis of what combined with the semantic positionallanguage model(SPLM), we give a Dirichlet prior distribution as smoothing measureto compute proximity, and presented a semantic positional language retrieval modelswith a proximity information.Specifically, the main work and innovation of this article are as follows:1) Firstly, we considered different kinds of kernel functions. Eventually, thispaper resolve the problems use the Gaussian function to measure the positionrelationship in original model. At length, we give the thought about how to combinethe proximity compute model with language model.2) Ranking search results is a fundamental problem in information retrieval,based on statistical probability and algorithm of linear level complexity theory, wepropose a proximity SPLM retrieval model for information retrieval, this paper willaccording to the thought of combination between proximity information and languagemodel, and give a way of combination between proximity information and SPLMmodel that use Dirichlet smoothing method. Further more, we compare theperformance of our retrieval model to SPLM model systematically, also have an efficiency analysis between Dirichlet prior distribution method and JM smoothingmethod used in SPLM model.3) We do the experiments and it show that our retrieval model performs betterthan the SPLM model for using in information retrieval. Further, we give sensitivityanalysis of the parameters of our model, and comparison of the different proximitystrategies also it’s the part of our work. At length, make an analyzes in different typesof the proximity in level of complexity of the algorithm.
Keywords/Search Tags:semantic positional language models, Dirichlet smooth, proximity model, retrieval model
PDF Full Text Request
Related items