Font Size: a A A

Improvements Of K-means Clustering Algorithm

Posted on:2017-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2308330488475439Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays, with the rapid development of computer information technology, the amount of collected data grows tremendously, we will always encounter image, text, video, audio and other kinds of data. Now the problem that we are very concerned about and eager to solve is how to quickly and effectively mine useful information or knowledge from huge amount of data. As a result, data mining emerges as required, providing a lot of effective methods and tools to solve the problem. Data mining contains a very important unsupervised method, namely clustering technique. In recent years, people have made improvement of the clustering analysis technology, so people pay more and more attention to it. Clustering techniques have achieved good results in some aspects of theory and application. Now, clustering analysis technology has been widely used in many fields, such as pattern recognition, machine learning, text classification, image processing, marketing, scientific statistics and other fields.At the present stage, we can summarize the clustering algorithm, which can be divided into several categories:hierarchical method, dividing method, grid method, density method and model method. Among them, the most popular is the k-means clustering algorithm. Although k-means clustering algorithm is simple, fast and effective, and has many advantages, there are still many deficiencies or disfigurement, such as the initial value of the algorithm with the selection of dependence and sensitivity, repeatedly calculating distance of each data object to the cluster center, led to the lack of the increase in the running time. In view of the above shortcomings, the main work done in this paper is as follows:1. for the k-means clustering algorithm to select the initial value is dependent and strong sensitivity and so on, this paper proposes an improved k-means clustering algorithm to prevent the random selection of initial cluster center. The dependence of the k-means clustering algorithm on the initial value is improved. We select the initial clustering number, the merging strategy of merger, the algorithm of cluster number does not require the user to pre given out, after the merge data sets of experimental results with the traditional k-means clustering algorithm are compared with the results, you can get high quality clustering results.2. For the k-means algorithm, the repeated computation that leads to a data object to the cluster center distance, the running time increases, this paper analyzes the shortcomings of k-means clustering algorithm, such as the calculation of the k-means clustering algorithm in every iteration of each data object distance to the cluster centers, which makes clustering efficiency is not high. We proposed in this paper an improved k-means clustering algorithm to solve this problem, in each iteration requires a simple data structure to store some related information, the purpose is improved in an iterative algorithm. The improved method avoids the repeated calculation of the distance between each data object to the cluster center, saves the total running time, and improves the efficiency. Finally, it can be concluded from the experimental results that the improved algorithm can improve the running time and improve the accuracy of clustering results, and reduce the time complexity of the k-means clustering algorithm.
Keywords/Search Tags:Clustering algorithm, k-means algorithm, Distante, Initial center, Data mining
PDF Full Text Request
Related items