Font Size: a A A

Research And Application On Information Retrieval Model Based On FCA

Posted on:2008-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2178360215472348Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Nowadays, using search engines to retrieval information on the internet becomes the most important device that people obtain information.However this doesn't means that the information retrieval technique has already satisfied people.Most query technique in Chinese information retrieval is based on keyword matching. "keyword" here is only a character presents on the pages, the semantic meanings they indicate are unused, and Page-Analyzing is based on the link relations between the pages, which doesn't show the information involved in the page.So, how to express the information requirement, how to lay out and browse the search structure, and how to build individuated and intelligent model based on information requirement etc.are the direction that future search engine puesued.Therefore, intelligent search engine based on conceptal relations is the most way to answer for the need of information retrieval.Formal Concept Analysis (FCA) is a field of applied mathematics based on the mathematization of concept and conceptual hierarchy. Thereby, it activates mathematical thinking for conceptual data analysis and knowledge processing. The major content in FCA is to extract formal concepts and connections between them from data in form of formal context so as to form a lattice structure of formal concepts. Concept lattices have been regarded as perfect abstraction of knowledge system.the results of data analysis are conepts instead of data themselves. Differing to traditional data analysis methods based on statistics, FCA obtains knowledge view which is higher level of representation of data.How to apply FCA to information retrieval, particularly, how to build up an IR model based on Formal Context for the FCA Search Engine, is the center mission of this text. This paper establishes an IR model on basis of Formal Context. Documents set is defined as the object set and keywords set that charactize the docments is defined as the attribute set.Hiberarchy relationship which reflects the connection between the documens and keyword and the connection between documents themself can be extracted from the context.Lattice is then constructed from the context .The lattice embodies the relationship clearly, and users can browse on the lattice. Since the object set extends dynamicly and so is the attribut set, we use Godin algorithm to build lattice. The context determines the structure of the lattice as well as the users' browsing efficiency, and finally determines the Precision and Recall of the FCASE system.Therefor, how to build up the formal context, that is, how to found the IR model for the FCASE system, is the key of the whole system.Especially the choosing of the attributes dierecterly determines the capability of the search engine.This paper proposes attribute–abstracting algorithm for the IR model. The basic thought of the attribute-abstrcting algorithm is as follows:firstly, do word-segmenting then statistics the word-frequency of each word in the documents set, secondly, calculate its tf*idf value, then reckon the weight value according to the value adjusting rules, finally, choose the suitable threshold valueλto limit the number of the attributes in order to construct formal context.Test data certificates the feasibility of the attribute- abstracting algorithm in refining the context and constructing IR model based on FCA.The superiority of the IR model based on formal context is embodied in the facet of .organising data ,that is reflecting the potential relation between the documents. Combined with the context reduction method, the model provids the customer a practical browsing manner based on lattice.The practical value and function of FCA IR model are certificated in the FCA search engine (the FCA SE) system.In this dissertation, the main four contributions are as follows:(1)Proposing the IR module based on FCA and prove its high efficiency and validity by tests.(2)Puting forward context constituted with documents as objects and keywords as attributes.And aslo achieve attributes-abstracting for the context.(3)According to the thought of Term-optimizing , realizing the algorithm of attributes-abstracting.(4)Applying the IR model into the FCASE system.The successful running of the system validates the feasibility of IR model and the practicability of the attribute-abstracting algorithm.
Keywords/Search Tags:Formal concept analysis, concept lattice, formal context, search engine, attribute–abstract
PDF Full Text Request
Related items