Font Size: a A A

Improvement And Application Of K-means Algorithm

Posted on:2014-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:J C HuangFull Text:PDF
GTID:2268330425972655Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Discover the potential, effective and valuable information from the massive data to provide information and decision for the financial industry and real estate industry, This process is called data mining. The cluster analysis is one of the most important method of data mining, The K-means algorithm is the simplest and most basic method of cluster analysis. The K-means algorithm has the operation simply, the speed quick, the processing large number occupies the good elastic merit, but when data processing also often exposes the fatal flaw.In view of fatal flaw in the K-means algorithm,this paper mainly makes the improvement and the analysis from three aspects.1) According to the Euclidean distance of the variables are treated equally, this paper proposed variation coefficient weighted method. Compared with the subjective experience weighted method of Data applications to prove that the variation coefficient weighted method weighted Euclidean distance combined with K-means algorithm is feasibility and rationality and provide methods and basis for the actual processing of data.2) For K-means algorithm of K value of the fuzziness and subjectivity, using distance cost function was proposed to determine the accurate K values.3) According to random initial values in K-means algorithm, using the sample data distribution and greedy algorithm is constructed to find the initial value, and write algorithm procedures. According to the modified K-means algorithm initial values and procedures used in the data instance, from the clustering results, the total distance and clustering between classes in the class, the number of iterations and the center of the initial and final clustering center change degree Angle analysis, the improved K-means algorithm has superiority than the traditional K-means algorithm.
Keywords/Search Tags:data mining, clustering analysis, K-means algorithm, weighted Euclidean distance, the variation coefficient method, K value, distance cost function, the initial value
PDF Full Text Request
Related items