Font Size: a A A

Research On Information Retrieval Model Based On Field Theory

Posted on:2008-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:W M YangFull Text:PDF
GTID:1118360215996374Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The problem that everyone almost must be faced is information surfeit in current "information explosion" times. The complexity of the magnanimity information and of the efficiency of the information processing seriously influenced people to obtain information. How to retrieval the information that user needed conveniently and accurately becomes the focus problem that people pay attention to. The information retrieval is the subject that researches how to search various form information (such as text document, picture, speech and video etc.) efficiently.In information retrieval, the text documents can be represented by the characteristic, which expressed by word, phrase and semantic concept. These express methods correspond with a class of word, of sentence and of document representation. And the granularity of information is from slenderer to coarser grain-size. At point of view from theory in information retrieval, the sentence is better than word, and document is better than sentence. But facing to different need, the information should be processed according to the needs, and the information can be transfer between the different granularity.This thesis analyses the document characteristic from different granularity according to different needs. It researches about the automatic index and text categorization. A new information retrieval model based on field is presented, which is applied to the document retrieval. The results show its definition of documents relevance can be better to express the relative of documents.The main content in the thesis include:(1) Quotient spaces of document. The paper introduces three layers of theory and method of granularity computing. The thesis builds three quotient spaces of document information based on quotient space theory, and shows the means of information retrieval at each space. A method of rough set theory can transform three spaces from slender granular to coaser granular. At last, the paper presents two kinds of strategies of information search.(2) Document automatic indexing. Under the instruction by M.L.Pao's theory, the thesis built the index word set, analyzed the relevance of the index word based on pair set theory, filtered the campaigner and expanding the document index according to the relevance of index word. Finally, thesis implemented the document automatic indexing.(3) Text Categorization. The thesis reduces the knowledge of the relevance of the index term based on rough set firstly, then builds three layer of relevance of index word of document class. A text categorization model based on the relevance of index word of document classifier is designed.(4) Information retrieval model based on field theory. The thesis analyzes the classic model of information retrieval. Handling the knowledge of field theory and combining to characteristics of information retrieval, the paper designs a new information retrieval based on field theory.The main innovation in the paper including:(1) Researching literature metrology theory, the paper bring forward a new scheme of selecting index word. This project cut down the quantity of campaigner of index word and reduces the complexity of automatic indexing in pre-processing.(2) Based on analyzing the relevance of index word, the paper presents a new text categorization model, which is based on the relevance of index term. The new model reduces the relevance of index word using the method of rough set and classes the relevance of index word into three layers. Finally, the text classifier based on the relevance of index word is devised.(3) After analyzing the foundation of various information retrieval models, the thesis put forward a new information retrieval model, which based on the field theory. This model is to describe the relevance of document by the function of text, which put the function of material on document.(4) After analyzing the needs, the paper builds the quotient space of sentence and of document based on information space of word of document. Using the rough set theory, the different granularity information of document can be transformed under granular analysis of quotient space. So, information retrieval in different granular comes true.
Keywords/Search Tags:Information retrieval, Field theory, Quotient space, Pair set analysis, Text indexing, Text Categorization
PDF Full Text Request
Related items