Font Size: a A A

Study In K-means Algorithm Based On The Vetrical Midpoint Of Density

Posted on:2013-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:W B ZhouFull Text:PDF
GTID:2268330401451318Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data mining is a kind of special data processing technology. It is used to find theknowledge of potential, useful and unknown to assist people make the right decisionsby analyzing the huge amounts of data or information. At present, data miningtechnology is the forefront of research topics in the field of information and databasetechnology, also recognized as the most promising one of the key technologies.The clustering analysis technique as one of the main pre-process methods of datamining. In recent years, as the data mining technology research and development, it iswidespread scholarly attention. The clustering analysis technique is a way ofreasonable classification. At present, domestic and foreign scholars have proposed avariety of clustering analysis algorithm, to some extent, promoting the developmentof data mining. The paper has studied the most commonly used K-means algorithmfor clustering analysis in depth on the basis of study and understanding of theclustering method. Meanwhile, the paper optimized and improved the clusteringalgorithm to the shortcomings of the traditional K-means algorithm, proposed a initialcluster centers selection algorithm based on the density of the vertical midpoint. Themain research contents which as follows:(1) The idea of the traditional K-means algorithm is very simple, but theclustering effect is more sensitive to the initial clustering center selection; Alsobecause of the randomness of initial clustering centers to be selected, the clusteringeffect is not stable, as well as the data sample space increasing, it will spend more andmore time and so on. So as for that, this article proposed based on densityperpendicular midpoint as the initial clustering center to divide, through calculate thedensity values of data sample points, reflect the general distribution of data samples;Meanwhile, the paper using the vertical midpoint of current highest density point andwhose nearest point as the initial clustering center, which has better representation.(2) In the light of the traditional K-means clustering methods needed to setclustering number manually, it have higher empirical requirements for the researchers,and the clustering effect is also affected by the clustering number. Therefore, on thebasis of the equilibrium of evaluation function, the paper around the essence ofclustering method that “low similarity between the two class, high similarity in the same class” to improving the algorithm. At first, setting the clustering numberautomatically and iterative selection; Then, use the clustering evaluation functionimproved to evaluate the clustering effect until find the minimum value of the k,which is the optimal number of clustering.(3) The paper had the improved algorithm and the traditional K-means clusteringalgorithm compared of the experiments in the UCI data set. Through the powerfulexperiments, discovered the improved algorithm has obvious advantages comparedwith the traditional K-means algorithm, and also has higher accuracy and betterstability which compared to some of the improved clustering algorithm of theliterature.The experimental results show that: the K-means algorithm based on the density ofvertical midpoint, which overcome the shortcomings of the traditional K-meansclustering algorithm, has certain application value.
Keywords/Search Tags:Clustering, K-means, density, vertical midpoint, evaluation function
PDF Full Text Request
Related items