Font Size: a A A

Research On The Improvement Of C-means Clustering Algorithm

Posted on:2012-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2178330332494934Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a non-supervised learning method, clustering analysis is an important analysis tool and method for data processing in data mining. The purpose of clustering is to partition a data set without any category tags into several clusters according to some rule, where the objects within the same cluster have high similarity, but the objects between clusters are very dissimilar. Clustering has been widely applied in many fields such as commerce, finance, image processing, information retrieve and so forth. Clustering analysis mainly focuses on how to design the practical clustering algorithm with better performance.C-means clustering algorithm is a typical partition method, including K-means (hard C-means) algorithm and Fuzzy C-means (FCM) algorithm. This algorithm is simple to implement, fast and effective. However, it has its own inherent disadvantages: the number of clusters should be given before clustering which is difficult for the inexperienced users; it always cannot get the global optimum value because of easily trapping into a local optimum; the choice of the initial centers has a great influence on the clustering result; the algorithm is sensitive to the isolated points and noise, etc. Optimizing deeply clustering algorithms will not only help to perfect its theory, but also its popularization and application.In this paper, we study intensively the classic K-means clustering algorithm and fuzzy C-means clustering algorithm, and sum up their strengths and weaknesses. To overcome the shortcomings such as the sensitivity to the initial clustering centers, the bigger computational cost etc., based on K-means algorithm and fuzzy C-means algorithm, two improved clustering algorithms are proposed in this paper. The researches and contributions can be summed up in two aspects as follows:(1) The traditional K-means algorithm sensitive to the initial clustering center, the isolated points and noise leads to the instability and the low accuracy clustering results. For the purpose of eliminating these shortcomings, based on attribute selection and feature weighting methods, an improved weighted K-means clustering algorithm is proposed. In the presented algorithm, firstly we preprocess the data set and the original clustering center is obtained according to the data sample distribution. Then the improved K-means algorithm using the feature weights is designed. Experiment results have shown that the proposed clustering algorithm can produce high quality clustering steadily. (2) The existing studies have proved that the first component of the color feature set can effectively describe the color pixels feature. To reduce the computational cost, the first component of color feature set is chosen as the one-dimensional eigenvector. It is used as the pixels in the image segmentation method based on the fuzzy C-means clustering algorithm. Based on the rough set theory, the cluster number and the initial cluster centers are obtained. Feature distance, which is suitable for any structure of eigenvector space, is used instead of Euclidian distance to overcome the influence caused by the structure of eigenvector space. Then an improved fuzzy C-means clustering algorithm is introduced to cluster the sample data. Experimental results show that the presented image segmentation method can effectively improve the precision and accuracy of image segmentation, and has small computational cost and fast convergence speed.
Keywords/Search Tags:Clustering Analysis, K-means Algorithm, Fuzzy C-means Algorithm, Feature Weighting
PDF Full Text Request
Related items