Research And Application Of Document Clustering Based On Ontology

Posted on:2013-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:L Lin

Full Text:PDF

GTID:2218330362460702

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the widespread use of personal computer and the futher development of network technology, more and more people are accustomed to store a variety of their information and data into the computer and network space in the form of electronic text. The massive growth of these texts, leads to the difficulties for people to filter and choose the useful information which is helpful to their work. It becomes a serious problem how to get the hidden and useful information potentially and then provide people some guidance and clues for people through the comparison of these texts and the analysis of the relevance of the document.Text clustering techniques is to get a set of texts tegether into a collection of several text clusters. It is a unsupervised text processing method. As one of the main methods of the area of the document mining, the document cluster method is used frequently and effectively in the area of the information retrieval. The division to the large-scale irregular text information is an important application research of the document clustering. This paper is mainly to apply the document clustering technology to the large number of text information collected by a company, and to provide some clues for the company through the analysis of the information.This paper describes the concept of document clustering and makes a detailed introduction to the text preprocessing, feature selection, text vector representation and the weights of the feature words and so on in the clustering process based on the specific application requirements. For the problem of ignoring the relationships between feature words in the document clustering process, this article refers to the knowledge of ontology. As the knowledge background of this paper, Hownet is used to merger the synonyms in the document after preprocessing, in order to reduce the dimension of text vector representation and to improve the document clustering results. In addition, this paper describes a veriety of clustering algorithm in detail and the advantages and of each one, so that researchers can select appropriate algorithm for the research work of clustering. The K-means, DBSCAN and COBWEB algorithm are selected in this paper.Experiments are carried out in this parper to apply the document clustering technology to the texts collected by a company. Through the document clustering, the set of documents are gathered into clusters. This experiment achieves better results.

Keywords/Search Tags:

clustering application, document clustering, clustering algorithm, ontology

PDF Full Text Request

Related items

1	The Research On Chinese Document Clustering Technology Based On Ontology
2	Research On Efficient Document Clustering Using Improvised Sub-Document Based Framework
3	Research Of The XML Document Clustering Using GA
4	Clustering Algorithm In The Web Mining Applications
5	Semantic frameworks for document and ontology clustering
6	Design And Implementation An Of Document Clustering Algorithm Based On The GPU
7	Research And Application Of Subject-oriented Document Resource Clustering
8	A Deep Embedding Clustering Algorithm Considering Preservation Of Initial Clustering Structure And Its Application
9	Application Of Sub-fuzzy C-means Algorithm In Document Clustering
10	Research On Parallel Non-Intervention Document Clustering Algorithm