Research On The Language Model Information Retrieval Method Based On Word Co-occurrence

Posted on:2014-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:X Z Zhao

Full Text:PDF

GTID:2268330425466553

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As the computer application technology has become more sophisticated and the rapiddevelopment of Internet applications, the process of social information has been sped up, andhumanity has entered an era of information explosion. Thus the information retrievaltechnology that enables people to quickly find useful information in the mass data emerged.In order to better solve the problems existing in information retrieval, research developsrapidly in the aspects such as retrieval model, sorting algorithm, document representationmodel and query expansion. Among them, retrieval model has always been the focus ofresearch in this field. Especially the application of language model greatly promotes thedevelopment of retrieval model in the field and achieves fruitful research results. But thetraditional language model ignores the potential semantic relatedness between words.In this paper, the study is divided into the following three parts:1. We excavate word co-occurrence in document through the association rules, use theco-occurrence words construct document set co-occurrence graph and document wordco-occurrence graph and discover the semantic relations between vocabularies in document.2. This paper proposes mixed text keyword extraction method based on the wordco-occurrence of multiple factors. Various factors which effects the key words was studiedand analyzed in detail and multiple factors was used to score the lexical weighting basically.Through the document word co-occurrence graph we analyze the relation between documentwords and make adjustments to lexical weighting score to complete the key words extraction.This part provides an important guarantee for the establishment of the retrieval model.3. A kind of language model based on word co-occurrence was put forward. The mainidea is marking theme words in each document of the professional document sets and buildingthe field of Thesaurus. The document is divided into two parts: field subject words and nonfield subject words. For the field subject words, through analyzing two co-occurrencerelations between vocabularies and subject words in document, acquaintance degree betweenvocabulary and Thesaurus was estimated and calculated, and then the similarity of queryinformation and documentation was estimated. In this paper, through experiment we verifythe superiority of Subject extraction method based on word co-occurrence, and proved the language information retrieval model based on word co-occurrence has an advantage in theprofessional field.

Keywords/Search Tags:

information retrieval, language model, word co-occurrence, subject extraction

PDF Full Text Request

Related items

1	Using Statistical Language Modeling For Ad Hoc Information Retrieval
2	Title Classification Research Of Collected Documents Based On Subject Matching
3	Research On Information Retrieval Language Under The Conditions Of Network
4	Co-occurrence Distance And Query Expansion Based Mongolian Information Retrieval System
5	Research On Language Modeling Based Sentence Retrieval
6	Bbs Spam Filtering Model Based On Word Co-ocurrence
7	BBS Spam Filtering Model Based On Word Co-ocurrence
8	Research On Keyword Extraction And Improved LSA Based On Co-occurrence Word
9	Chinese Word Semantic Similarity Measure And Its Application In Cross-language Information Retrieval
10	Research On Domain-Specific Term Extraction Based On Semi-Supervised Learning