Font Size: a A A

Research On Dependency Language Model For Information Retrieval

Posted on:2005-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:G Y WuFull Text:PDF
GTID:2168360122987476Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Statistical language model is a language model surged in the 1980s. After over twenty-year-long development, it has stepped into every aspects of computational linguistics and achieved a lot in all the fields, for instance speech recognition, handwriting recognition, machine translation, information retrieval, Chinese word segmentation, Asian language input. However, the traditional statistical language model, n-gram model, considers only the relation between the n-neighboring words, predicting the next word according to the previous words, so it will lost much valuable information during the training process of the models and hurt the performance of the models, such as syntactic relation, terms collocation and co-occurrence information and words linkage information.This thesis presents a new dependence language modeling approach to information retrieval. The approach extends the basic language modeling approach based on Unigram by relaxing the independence assumption. We integrate the linkage of a query as a hidden variable, which expresses the term dependencies within the query as an acyclic, planar, undirected graph. We then assume that a query is generated from a document in two stages: the linkage is generated first, and then each term is generated in turn depending on other related terms according to the linkage.We also present a statistical smoothing method for model parameter estimation and an approach to learning the linkage of a sentence in an unsupervised manner, which all make the dependency model for information retrieval apply successfully.Our results show that our model achieves substantial and significant improvements on TREC collections over Unigram language model and classical probabilistic model.
Keywords/Search Tags:language model, n-gram model, dependency language model, statistical smoothing, information retrieval.
PDF Full Text Request
Related items