Study On New Data And Text Clustering Methods Based On Representatives

Posted on:2007-04-24

Degree:Master

Type:Thesis

Country:China

Candidate:X S Wang

Full Text:PDF

GTID:2178360212480625

Subject:Systems Engineering

Abstract/Summary:

PDF Full Text Request

Clustering is an efficient method of data mining and text mining. It possesses important theoretical and practical significance to further improve the methods, raise the performance of clustering, and make the methods more satisfy the requirement of data mining and text mining techniques progress. This paper presents a new data clustering method based on density clustering utilizing representatives and a new text clustering method based on the hierarchy clustering utilizing representatives, and mainly includes two aspects as follows.A new efficient method of data clustering is presented, which is based on density clustering utilizing representatives. This method looks for representatives first, and calculates their density, and then introduces the density information into the distance computation between each two representatives using a new distance formula. The nearest pair of representatives is called abut-points that are linked by a line. The representative sets produced in this way are described by a non-direction graph, and then the representatives which are in the same connected sub-graph are found by using the extent-priority searching algorithms to get the final clustering results. The new distance formula considers the density information of representative points, so the clustering result is more precise than those using the existing similar methods. This method also overcomes the difficulty of setting the number of clusters in advance, it only needs to set a density threshold instead of that, which is easier for users and will not influence on clustering results. This method is more efficient than the traditional methods, such as CURE, and so it is suitable for large scale and high dimensional data clustering.A new efficient method of text clustering is presented, which is based on the hierarchy clustering utilizing representative points. The method divides the data to be clustered into many partitions, and clusters the partitions from bottom to top. Compared with the traditional similar methods, the present method not only computes faster, but also can recognize the species of arbitrary shape and size, and filter noisy data. It is suitable for the text clustering with high dimension features.

Keywords/Search Tags:

Representative points clustering, density clustering, hierarchy clustering, text clustering

PDF Full Text Request

Related items

1	Research And Application Of Density Peak Clustering Algorithm Based On Natural Neighbors And Representative Points
2	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
3	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation
4	Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering
5	Research Of Distributed Clustering Algorithm Based On Density
6	Density Clustering Algorithm Based On Improved Support Vector Machine
7	Study On Residual Error-Based Clustering Algorithms
8	The Research And Application Of Clustering Algorithm Based On Density
9	Research On Density Peaks Clustering
10	Research On Density-based Hierarchical Clustering Algorithm