Font Size: a A A

Positional Language Models With Semantic Information

Posted on:2013-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:W YuFull Text:PDF
GTID:2298330377459820Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the past four decades, there have been many classic models in the field ofinformation retrieval, such as the Boolean model, Vector space model andProbabilistic model. As Pnoet and Corft first proposed Statistical languagemodel-based retrieval model, the model has been rapid development in recent years.Many scholars have joined in this field of research, and have done a lot of meaningfulwork. Hidden markov models, Statistical Translation models, the risk minimizationfor information have been proposed in turn.But most of the retrieval model is based on the frequency of words in thedocument, and do not consider the position relationship of the word in the document.For example, the two documents contain the same set of words and the frequency ofeach word in the two documents is the same. The only difference between the twodocuments is that these words have different order in the position of document. So thetwo document retrieval would have the same retrieval score for most of the retrievalmodel. However, if the query words in the first document appear closer than the querywords in the second one, it is obvious that the first document should gain higherretrieval score.Base on the issue above, Lv and Zhai proposed Positional language models, andapplied it to information retrival. The biggest advantage of this model is that it hasconsidered the positional relationship of words in document, but the model did nottake semantic relationship between them into account. Based on this, in this paper wepropose a new model which named positional language models with semanticinformation.Specifically, the main work and innovation of this article are as follows:1) We propose a new technology which has the name of smoothed mutualinformation. It is used to measure the transition probability between the two words.Because of the sparsity of words in the dataset, using mutual information to measurethe transition probability of the two words would cause a lot of words could not becalculated. We use a smoothing technique to calculate mutual information, so thatalmost any two words in dataset can be calculated the smoothed tmutual information.And what is more important is that this smoothing technique follows the originaldistribution of words in the dataset. We give theoretical proof of the smoothingtechnique in the appendix.2) Base on probability statistics and smoothed mutual information, we propose anew model which is positional language models with semantic information. In thispaper, we give estimated ideological and estimation methods of the unknownparameters in this model and compared the similarities and differences of positionallanguage models with our model. Finally, we prove positional language models is aspecial case of our model.3) We do the experiments and it show that our model performs better than the existing model for using in information retrieval. Futher, we give sensitivity analysisof the parameters of our model, and mainly show how three parameters affect the tworetrieval models.
Keywords/Search Tags:positional language models, mutual information, smoothing techniques, information retrieval, semantic relation
PDF Full Text Request
Related items