Font Size: a A A

Clustering Analysis Based On Improved K - Means Algorithm

Posted on:2016-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhengFull Text:PDF
GTID:2208330470962880Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of technology, the relationship between data mining technology and people’s daily life is becoming more and more closed as well as the researches on applications of various clustering algorithms also become important direction. As one of the well-known and commonly used clustering analysis algorithms, K-means algorithm has the following strengths: simple and easy to understand, low time complexity, fast convergence speed, not sensitive to the input sequence and can effectively deal with large data sets. Therefore, the K-means algorithm has been widely used in practice.In this thesis, we focus on K-means algorithm and do the research on both its strengths and weaknesses. We find out the main disadvantages of K-means algorithm is:(1) It needs to pre-assign the exact cluster number:there is no clear standard to adopt when attempt to decide how many clusters the data set should be divided for a given data set, in most cases, the number of cluster is decided based on personal experience. (2) It does not consider that the data sets contain a plurality of different views and variables:when clustering, K-means algorithm does not take the differences among different views into account, it sees different views as a group of plane variables to cluster, and also ignores the different variables have different weights in views.For these two major deficiencies existing in the K-means algorithm, this thesis brings up an improved adaptive two-level variable weighting K-means clustering algorithm which will automatically decide the number of cluster according to the features of the data sets without inputting the number of clusters in advance. At the same time, the improved algorithm will take the characteristics of different views and variables of data sets into consideration, instead of considering all the data as a group of plane vectors. So to a great extent, the improved algorithm has better changed the shortcomings of K-means algorithm.The experimental results show that while performing clustering the improved algorithm doesn’t require to pre-assign the final number of clusters but automatically cluster the proper number of clusters which greatly overcomes the drawbacks of K-means clustering algorithm. What’s more, the improved algorithm is considering the situation that data sets may contain multiple views and variables when clustering, so it sets weights to the views and variables automatically. Except that, we also make a comparison between the improved adaptive two-level variable weighting algorithm and the two-level variable weighting algorithm and adaptive k-means algorithm to prove that the improved algorithm has better clustering results and less execution time.
Keywords/Search Tags:Data Mining, Clustering Analysis, K-means Algorithm, Variable Weighting, Adaptive K Value
PDF Full Text Request
Related items