Font Size: a A A

The Research Of K-means Clustering Algorithm Improvement

Posted on:2016-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiFull Text:PDF
GTID:2308330461491778Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid increase of information technology and the emergence rise of the Web technology promote the access and storage of information toward to automation, rapidness and intelligent development. In the face of mass, irregular data resources, data mining technology arises at the historic moment. In the study of data mining, clustering analysis technology is one of the important research branch, and a kind of unsupervised clustering analysis technology. With the classification of exploratory technology, it aims to the premise of without any prior knowledge, and will be a no category id of the collection of data, so the result is different sets of clusters. The data in a cluster meet in the same cluster as similar as possible, but the difference of the objects in different clusters as large as possible. The cluster analysis technology is applied in many fields, such as ecommerce, data statistics, Web analytics, biological medicine, the physical prospecting.K-means algorithm is a classical clustering analysis algorithm, which is based on partition technique by choosing the initial clustering center to carry on the reasonable classification data sets, calculating the average of each of the generated clustering to reasonably adjust the center of the cluster. Through multiple iterations, the algorithm finally realizes the farthest distance between the clusters, and the smallest similarity in a cluster. K-means algorithm principle is simple, easy to implement, and when processing the large-scale data sets, it has better ductility and time complexity. However, it still has shortcomings, such as:K-means algorithm is sensitive to the choice of initial clustering centers, and the improper initial centers would lead to the bad result; the final analysis results are often the local optimal results, but not for the global optimal results. In addition, K-means algorithm requires a given cluster number of K. Based on the theoretical basis of adaptive feature weights and genetic algorithm, we can successfully solve the deficiency of the traditional K-means algorithm to avoid clustering analysis results into local optimum, effectively improve the accuracy and stability of the algorithm.For the traditional K-means algorithm shortcomings, such as fixed weight and dependence on the initial centers, we improve the algorithm:if an attribute has more importance, it will have higher weight, so we should adjust the weight according to the importance during the runtime. Without the given K and the density of data object, the algorithm chooses initial clustering center, then according to the criterion function to decide the best value of K. In the high-density collection, we choose a part of elements as the initial center. In the iteration, the algorithm computers the distance-in-cluster and the distance-between cluster to adjust the attributes’ level.To improve K-means clustering analysis algorithm, we combine genetic algorithm and the adaptive feature weight to apply to the K-means clustering analysis algorithm. Based on the adaptive feature weight, the algorithm uses genetic algorithm to obtain the better initial cluster centers, then use the K-means clustering analysis algorithm cluster the whole collection of data objects, effectively improve clustering result. The improved K-means algorithm is applied to experimental data sets and image segmentation, which is one important application of clustering algorithm, to compare its experimental result.Experiment use standard data sets to test two improved algorithms from the aspects of accuracy and iteration numbers and clustering centers to compare the algorithms, confirmed the efficiency of the improved K-means clustering analysis algorithm is better.
Keywords/Search Tags:Clustering analysis, K-means algorithm, Feature weighting, Genetic algorithm
PDF Full Text Request
Related items