Font Size: a A A

Study On Clustering Algorithm

Posted on:2007-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:G B LiFull Text:PDF
GTID:2178360212967849Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present, clustering has achieved great success in many fields including pattern recognition, system modeling, image processing, data mining, etc. Its basic algorithms have been widely applied in the life science, medicine, social science, geography science and so on. Clustering is the process of grouping a set of physical or abstract objects into classes of similar objects. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other clusters. It is a typical unsupervised algorithm. This thesis focuses on key techniques and algorithms of cluster analysis. The main research content includes:(1) Firstly, aimed at deficiency of traditional K-means clustering, this thesis proposes an improved algorithm. When one class changes less after being modified, keep the old class center. Therefore, there is no need to compute the new class center and the distances between samples and the class. Experiences prove the improved algorithm spends less time in clustering with satisfactory precision.(2) A hierarchical clustering algorithm based on granularity is presented. In one iterance, if the distance between any pair of clusters is less than the given threshold, they are regarded as adjacent clusters under the current granularity and are merged. The process repeats until satisfying the condition. Experiments show that this algorithm can achieve hierarchical clustering of a data set, and require less time with precision ensured.(3) Inspired by the CURE algorithm, this thesis puts forward a new clustering algorithm, which represents each cluster using multiple Deputies. It obtains clustering results firstly by partitioning samples into fewer atomic clusters, and merging the adjacent atomic clusters. Experiments prove that the algorithm is more robust to outliers, and can identify clusters having non-spherical shapes and wide variances in size. It is also a linear-time clustering algorithm, and therefore, it facilitates the clustering of a very large data set.(4) A new method is presented, which combines the hierarchical clustering...
Keywords/Search Tags:cluster analysis, algorithm, time complexity, Hierarchical Cluster, K-Means
PDF Full Text Request
Related items