Font Size: a A A

Study On New Data And Text Clustering Methods Based On Representatives

Posted on:2007-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:X S WangFull Text:PDF
GTID:2178360212480625Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Clustering is an efficient method of data mining and text mining. It possesses important theoretical and practical significance to further improve the methods, raise the performance of clustering, and make the methods more satisfy the requirement of data mining and text mining techniques progress. This paper presents a new data clustering method based on density clustering utilizing representatives and a new text clustering method based on the hierarchy clustering utilizing representatives, and mainly includes two aspects as follows.A new efficient method of data clustering is presented, which is based on density clustering utilizing representatives. This method looks for representatives first, and calculates their density, and then introduces the density information into the distance computation between each two representatives using a new distance formula. The nearest pair of representatives is called abut-points that are linked by a line. The representative sets produced in this way are described by a non-direction graph, and then the representatives which are in the same connected sub-graph are found by using the extent-priority searching algorithms to get the final clustering results. The new distance formula considers the density information of representative points, so the clustering result is more precise than those using the existing similar methods. This method also overcomes the difficulty of setting the number of clusters in advance, it only needs to set a density threshold instead of that, which is easier for users and will not influence on clustering results. This method is more efficient than the traditional methods, such as CURE, and so it is suitable for large scale and high dimensional data clustering.A new efficient method of text clustering is presented, which is based on the hierarchy clustering utilizing representative points. The method divides the data to be clustered into many partitions, and clusters the partitions from bottom to top. Compared with the traditional similar methods, the present method not only computes faster, but also can recognize the species of arbitrary shape and size, and filter noisy data. It is suitable for the text clustering with high dimension features.
Keywords/Search Tags:Representative points clustering, density clustering, hierarchy clustering, text clustering
PDF Full Text Request
Related items