Font Size: a A A

Research On The Language Model Information Retrieval Method Based On Word Co-occurrence

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X Z ZhaoFull Text:PDF
GTID:2268330425466553Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the computer application technology has become more sophisticated and the rapiddevelopment of Internet applications, the process of social information has been sped up, andhumanity has entered an era of information explosion. Thus the information retrievaltechnology that enables people to quickly find useful information in the mass data emerged.In order to better solve the problems existing in information retrieval, research developsrapidly in the aspects such as retrieval model, sorting algorithm, document representationmodel and query expansion. Among them, retrieval model has always been the focus ofresearch in this field. Especially the application of language model greatly promotes thedevelopment of retrieval model in the field and achieves fruitful research results. But thetraditional language model ignores the potential semantic relatedness between words.In this paper, the study is divided into the following three parts:1. We excavate word co-occurrence in document through the association rules, use theco-occurrence words construct document set co-occurrence graph and document wordco-occurrence graph and discover the semantic relations between vocabularies in document.2. This paper proposes mixed text keyword extraction method based on the wordco-occurrence of multiple factors. Various factors which effects the key words was studiedand analyzed in detail and multiple factors was used to score the lexical weighting basically.Through the document word co-occurrence graph we analyze the relation between documentwords and make adjustments to lexical weighting score to complete the key words extraction.This part provides an important guarantee for the establishment of the retrieval model.3. A kind of language model based on word co-occurrence was put forward. The mainidea is marking theme words in each document of the professional document sets and buildingthe field of Thesaurus. The document is divided into two parts: field subject words and nonfield subject words. For the field subject words, through analyzing two co-occurrencerelations between vocabularies and subject words in document, acquaintance degree betweenvocabulary and Thesaurus was estimated and calculated, and then the similarity of queryinformation and documentation was estimated. In this paper, through experiment we verifythe superiority of Subject extraction method based on word co-occurrence, and proved the language information retrieval model based on word co-occurrence has an advantage in theprofessional field.
Keywords/Search Tags:information retrieval, language model, word co-occurrence, subject extraction
PDF Full Text Request
Related items