Font Size: a A A

Research Of K-means Clustering Algorithm

Posted on:2014-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:R R HanFull Text:PDF
GTID:2298330452462702Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important data mining technology, the classification of the dataset used to find a relationship. Clustering is different from the classification, it does not relyon any prior knowledge, through their own data, the characteristics of the data into differentclasses. Clustering method has a lot of kinds, which is based on the division of the method issimple and effective, is one of the most commonly used method. Main methods are K-means,K-medoids methods and their deformation.Classic K-means clustering algorithm is a kind of hard partition method, it will be thedata centralized data object strict division to a class, the algorithm of solving process is aniterative process, when the algorithm iteration terminates when meet the predeterminedconvergence condition, output the final clustering results. Its algorithm is simple and easy tounderstand, and the operation efficiency is very high, people usually use the algorithm to dealwith large data sets. But K means clustering algorithm has some defects, people made manyimprovements on its defects, which will be introduced to the fuzzy theory and the theory ofrough K-means algorithm, carries on the improved ideal effect. Main methods are FKM,HKM method, RFKM methods and their deformation.Paper first introduces the clustering analysis of some basic concepts and some techniques,such as clustering analysis in data structures, data types, the criterion function, etc., andemphatically introduces the classic K-means algorithm, such as ideology, the steps of thealgorithm, and the advantages and disadvantages of the algorithm is analyzed and discussed indetail.Paper introduced the fuzzy set theory and rough set theory, and their effect on theclassical optimization algorithm of K-means algorithm. First introduced the fuzzy K-meansalgorithm, and analyses its advantages and disadvantages, pointed out that the algorithm is arelatively obvious shortcoming: the data set for each data object for all classes of membershipdegree is1, the sum of the constraint condition is too strict, if when there is a noise point dataset algorithm will be affected by noise points is larger. Then, the paper introduces theclustering algorithm based on rough set, rough set theory is introduced after the improvedalgorithm not only in the clustering results and efficiency are greatly improved, based on theadvantages of paper to fuzzy K means clustering algorithm and the rough fuzzy K-means algorithm was improved.Introduces a new way of measurement-AM metrics, and this measurement method isintroduced into the fuzzy K-means algorithm and the rough fuzzy K-means algorithm, toimprove the algorithm. Membership normalized constraint conditions of algorithm, in thispaper, the algorithm of membership degree constraint conditions have been dealt with ease,the improved algorithm are obtained. Through experimental analysis show that the algorithmafter replacement of metrics and to broaden the membership after the modification of theconstraint condition, the results not only can improve the clustering accuracy and shorten theoperation time.
Keywords/Search Tags:K-means clustering algorithm, Rough fuzzy clustering method, The fuzzymembership degree
PDF Full Text Request
Related items