Font Size: a A A

The Research Of K-means Clustering Algorithm In Data Mining

Posted on:2016-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2308330461988422Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In recent years, global data is constantly improving and accumulation. When faced with such a amounts of data, data processing techniques slightly behind, which leads to data mining technology has been an unprecedented rapid development. Data mining is the process to analyze large data, discover the unknown data, with regularity, but has the value. As one of the important techniques of data mining, cluster analysis object is put in the same category as large as possible similarity,but the different types of similarity as small as possible.K-means algorithm is one of the most famous and most popular clustering algorithms, which based on the division algorithm. K-means algorithm in dealing with the average data quickly of efficiently and the computational complexity scalability characteristics, but K-means algorithm must require the user to determine the number of clusters, and often terminate the local optimum get the best result clustering results,randomly chosen initial cluster centers makes clustering instability, and there are very sensitive and noise shortcomings.First of all, it introduces data mining from the historical background and significance in this paper, which analyzes from clustering criterion function, data type, data structure and similarity introduced cluster analysis. Then the K-means algorithm principles are introduced,including advantages and disadvantages. Secondly, aimed at the disadvantage of K-means algorithm proposes two improved algorithm.On the one hand, in view of the K-means algorithm requires the user to the shortcoming of k value specified size in advance to automatically generate K algorithm is proposed, it reduces the user dependence on k value. On the other hand, using coordinate rotation algorithm improvedthe shortcoming of random initial center, so as to make the clustering results become stable, and the reliability of the two kinds of improved algorithm has been proved by the experiment. Finally, the improved algorithm and K-means algorithm application in market segmentation is discussed. We use the men wallets of all kinds and shops in taobao sales to do the segmentation results, the result is very perfect. It provides business decision for enterprise decision makers, provides help for the enterprise to enter the taobao market, and greatly reduces the investment risk of the enterprise.
Keywords/Search Tags:Data Mining, K-means, Cluster analysis, Coordinate rotation, Maximum distance
PDF Full Text Request
Related items