Font Size: a A A

The Application Of Formal Concept Analysis And Ontology In Text Mining

Posted on:2009-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:M Z TangFull Text:PDF
GTID:2178360245956775Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and spread of Internet, electronic information increases dramatically. It becomes a hotspot for information science and technology that how to collect and find the interested information of user, and discovery latent, useful knowledge quickly, exactly and fully. Data mining technology is a new research field to solve the problem. Structural data such as relational database is main research object for DM, but a majority of information exists with the form of unstructural data in realization. So mining the unstructured information succeeds DM as a new challenge.Text data is a form of information used most spread among common unstructural data. It is often used in digital library, product catalog, news group, medicine report, organization and individual homepage. Text mining technique is also applied abroadly to natural language understanding, text automatic abstracting, information extracting, information filtering, information retrieval fields, etc. So its value of business is higher than DM.Formal Concept Analysis (FCA) is an ordered-theoretic method for the mathematical analysis of scientific data, invented by R.Wille in 1982. Concept lattice structured model is its core data structure. Concept lattice can be exploited to discover implications among the objects and attributes and manifest concept relation between abstraction and instance. Nowadays FCA can be exploited abroadly and applied in machine learning, information retrieval, software engineering fields, etc.Ontology is formal specification of shared conceptualization. Ontology is a modeling tool of descriping information system in semantics and knowledge. It is applied abroadly in many computer fields such as knowledge engineering, digital library, software reuse, information retrieval, heterogeneous information processing on web, semantic web and so on.Text data is exploited object and the research for application of formal concept analysis and ontology in text mining is done in this paper, including text feature extracting, text clustering, and text classification and so on.Our primary works are as follow.(1) Text clustering is an important method in text mining. A novel multi-context text fuzzy clustering method and its model based on formal concept analysis and concept similarity is proposed. The semantic relationships between multi-context key words have been taken into account and the fuzzy similarity matrix has been derived from non-distance computing in this method. The corresponding clustering results will be obtained according to the different requirements and this approach has better flexibility. Finally, the example is given to illustrate the feasibility of the algorithm.(2) Text classification plays an important role in the fields of text mining and document management. The core ontology Wordnet in text preprocessing has been introduced to enrich text representation that improves its generality. Then KNN algorithm has been used to classify text documents. Finally the experimental results on the corpus of Reuters-21578 show that some strategies using ontology can achieve better performance in text classification compared with the method without using ontology.
Keywords/Search Tags:concept lattice, ontology, text mining, text clustering, text classification
PDF Full Text Request
Related items