Font Size: a A A

Research And Application Of Document Clustering Based On Ontology

Posted on:2013-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:L LinFull Text:PDF
GTID:2218330362460702Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the widespread use of personal computer and the futher development of network technology, more and more people are accustomed to store a variety of their information and data into the computer and network space in the form of electronic text. The massive growth of these texts, leads to the difficulties for people to filter and choose the useful information which is helpful to their work. It becomes a serious problem how to get the hidden and useful information potentially and then provide people some guidance and clues for people through the comparison of these texts and the analysis of the relevance of the document.Text clustering techniques is to get a set of texts tegether into a collection of several text clusters. It is a unsupervised text processing method. As one of the main methods of the area of the document mining, the document cluster method is used frequently and effectively in the area of the information retrieval. The division to the large-scale irregular text information is an important application research of the document clustering. This paper is mainly to apply the document clustering technology to the large number of text information collected by a company, and to provide some clues for the company through the analysis of the information.This paper describes the concept of document clustering and makes a detailed introduction to the text preprocessing, feature selection, text vector representation and the weights of the feature words and so on in the clustering process based on the specific application requirements. For the problem of ignoring the relationships between feature words in the document clustering process, this article refers to the knowledge of ontology. As the knowledge background of this paper, Hownet is used to merger the synonyms in the document after preprocessing, in order to reduce the dimension of text vector representation and to improve the document clustering results. In addition, this paper describes a veriety of clustering algorithm in detail and the advantages and of each one, so that researchers can select appropriate algorithm for the research work of clustering. The K-means, DBSCAN and COBWEB algorithm are selected in this paper.Experiments are carried out in this parper to apply the document clustering technology to the texts collected by a company. Through the document clustering, the set of documents are gathered into clusters. This experiment achieves better results.
Keywords/Search Tags:clustering application, document clustering, clustering algorithm, ontology
PDF Full Text Request
Related items