Font Size: a A A

The Research On Chinese Document Clustering Technology Based On Ontology

Posted on:2009-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:C L YangFull Text:PDF
GTID:2178360275961266Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, along with the development of internet technology and the progress of the means of information communication, the information that person can get on the internet is richer and richer, especially the tremendous document. In order to facilitate people to the effective manage and retrieve the vast amounts of text resources, how to effective navigate, summarize and organize these documents has become a needed-be-resolved problem in the area of computer science and information science.Along with the in-depth understanding and extensive application of document clustering technology, classical vector-space model using the keywords as the feature don't really address the special problem of the document clustering: high dimensionality of the data, synonymous words and polysemous word. These problems have greatly affected on the efficiency of document clustering algorithm and the results of document clustering. The application of ontology theory proposes a method for solving these problems.Ontology originating in Philosophy is an advanced technology of knowledge representation in AI. It uses concepts and relations between concepts to describe abstract facts and build models. In recent years, Ontology was widespread concerned in the field of information. It was extensively used in Semantic Web, Search engine, E-commerce, Natural Language Processing, Knowledge Engineering, Information Extraction, Multi-Agent Systems, Database Design and Digital Libraries, and etcThis paper researches on Chinese document clustering technology based on HowNet ontology library; and propose a new method for Chinese document clustering technology. In this paper, uses HowNet as background knowledge, processes the synonymous words, polysemous word, and maps the simple words to the concepts. Then these concepts are clustered by Chameleon algorithm. The document clustering has been performed in terms of the results of concept clustering. This algorithm adopts an idea of continuous cluster, so as to achieve a final document clustering. The document is presented by the concept vector to decrease the dependence relations among of documents. The computation complexity of the document clustering algorithm is reduced efficiently and effectively.
Keywords/Search Tags:Document clustering, Ontology, HowNet, Concept vector-space model, Chameleon algorithm
PDF Full Text Request
Related items