Clustering Analysis Based On Improved K - Means Algorithm

Posted on:2016-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:C C Zheng

Full Text:PDF

GTID:2208330470962880

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the fast development of technology, the relationship between data mining technology and people’s daily life is becoming more and more closed as well as the researches on applications of various clustering algorithms also become important direction. As one of the well-known and commonly used clustering analysis algorithms, K-means algorithm has the following strengths: simple and easy to understand, low time complexity, fast convergence speed, not sensitive to the input sequence and can effectively deal with large data sets. Therefore, the K-means algorithm has been widely used in practice.In this thesis, we focus on K-means algorithm and do the research on both its strengths and weaknesses. We find out the main disadvantages of K-means algorithm is:(1) It needs to pre-assign the exact cluster number:there is no clear standard to adopt when attempt to decide how many clusters the data set should be divided for a given data set, in most cases, the number of cluster is decided based on personal experience. (2) It does not consider that the data sets contain a plurality of different views and variables:when clustering, K-means algorithm does not take the differences among different views into account, it sees different views as a group of plane variables to cluster, and also ignores the different variables have different weights in views.For these two major deficiencies existing in the K-means algorithm, this thesis brings up an improved adaptive two-level variable weighting K-means clustering algorithm which will automatically decide the number of cluster according to the features of the data sets without inputting the number of clusters in advance. At the same time, the improved algorithm will take the characteristics of different views and variables of data sets into consideration, instead of considering all the data as a group of plane vectors. So to a great extent, the improved algorithm has better changed the shortcomings of K-means algorithm.The experimental results show that while performing clustering the improved algorithm doesn’t require to pre-assign the final number of clusters but automatically cluster the proper number of clusters which greatly overcomes the drawbacks of K-means clustering algorithm. What’s more, the improved algorithm is considering the situation that data sets may contain multiple views and variables when clustering, so it sets weights to the views and variables automatically. Except that, we also make a comparison between the improved adaptive two-level variable weighting algorithm and the two-level variable weighting algorithm and adaptive k-means algorithm to prove that the improved algorithm has better clustering results and less execution time.

Keywords/Search Tags:

Data Mining, Clustering Analysis, K-means Algorithm, Variable Weighting, Adaptive K Value

PDF Full Text Request

Related items

1	Research And Implementation On Variable Weighting In K-means Type Clustering
2	Research On Problems Related To The Initial Center Selection In K-means Clustering Algorithm
3	Research Of K-means Clustering Algorithm Based On Variable Precision Rough Set
4	Adaptive Clustering Algorithm And Its Application Technology Research And Implementation
5	A Research Of Genetic K-Means Algorithm Based On Variable Length Encoding
6	Research On The Improvement Of C-means Clustering Algorithm
7	Research And Improvement Of K - Means Clustering Algorithm
8	Improvement And Empirical Study Of K-means Ciustering Algoirthm On Panel Data Analysis
9	The Improvement On The Fuzzy C-means Algorithm
10	Scmi-superviscd K-means Clustering Algorithm In Data Mining