The Research On Chinese Document Clustering Technology Based On Ontology

Posted on:2009-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:C L Yang

Full Text:PDF

GTID:2178360275961266

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Nowadays, along with the development of internet technology and the progress of the means of information communication, the information that person can get on the internet is richer and richer, especially the tremendous document. In order to facilitate people to the effective manage and retrieve the vast amounts of text resources, how to effective navigate, summarize and organize these documents has become a needed-be-resolved problem in the area of computer science and information science.Along with the in-depth understanding and extensive application of document clustering technology, classical vector-space model using the keywords as the feature don't really address the special problem of the document clustering: high dimensionality of the data, synonymous words and polysemous word. These problems have greatly affected on the efficiency of document clustering algorithm and the results of document clustering. The application of ontology theory proposes a method for solving these problems.Ontology originating in Philosophy is an advanced technology of knowledge representation in AI. It uses concepts and relations between concepts to describe abstract facts and build models. In recent years, Ontology was widespread concerned in the field of information. It was extensively used in Semantic Web, Search engine, E-commerce, Natural Language Processing, Knowledge Engineering, Information Extraction, Multi-Agent Systems, Database Design and Digital Libraries, and etcThis paper researches on Chinese document clustering technology based on HowNet ontology library; and propose a new method for Chinese document clustering technology. In this paper, uses HowNet as background knowledge, processes the synonymous words, polysemous word, and maps the simple words to the concepts. Then these concepts are clustered by Chameleon algorithm. The document clustering has been performed in terms of the results of concept clustering. This algorithm adopts an idea of continuous cluster, so as to achieve a final document clustering. The document is presented by the concept vector to decrease the dependence relations among of documents. The computation complexity of the document clustering algorithm is reduced efficiently and effectively.

Keywords/Search Tags:

Document clustering, Ontology, HowNet, Concept vector-space model, Chameleon algorithm

PDF Full Text Request

Related items

1	Research On Text Clustering Based On Hownet
2	Research On A Concept Vector Model Of Documents Based On Ontology
3	Research Of Text Clustering On Food Complaint Documents Based On Ontology
4	Research On Automatic Multi-document Summarization Based On Statistics And Semantic Analysis
5	Reserch And Implementation On Semi-Automatic Domain Ontology Acquisition Method
6	Research On Ontology-based Semantic Retrieval
7	Research On Chinese Texts Clustering
8	Compare Analysis Of Document Clustering Algorithm For Large Data Set And The Application In Sense Induction
9	The Research Of Enterprise Document Retrieval Model Based On Ontology
10	Theory And Practice Of AP Algorithm And Chameleon Clustering