Study In K-means Algorithm Based On The Vetrical Midpoint Of Density

Posted on:2013-08-02

Degree:Master

Type:Thesis

Country:China

Candidate:W B Zhou

Full Text:PDF

GTID:2268330401451318

Subject:Computer Science and Technology

Abstract/Summary:

Data mining is a kind of special data processing technology. It is used to find theknowledge of potential, useful and unknown to assist people make the right decisionsby analyzing the huge amounts of data or information. At present, data miningtechnology is the forefront of research topics in the field of information and databasetechnology, also recognized as the most promising one of the key technologies.The clustering analysis technique as one of the main pre-process methods of datamining. In recent years, as the data mining technology research and development, it iswidespread scholarly attention. The clustering analysis technique is a way ofreasonable classification. At present, domestic and foreign scholars have proposed avariety of clustering analysis algorithm, to some extent, promoting the developmentof data mining. The paper has studied the most commonly used K-means algorithmfor clustering analysis in depth on the basis of study and understanding of theclustering method. Meanwhile, the paper optimized and improved the clusteringalgorithm to the shortcomings of the traditional K-means algorithm, proposed a initialcluster centers selection algorithm based on the density of the vertical midpoint. Themain research contents which as follows:(1) The idea of the traditional K-means algorithm is very simple, but theclustering effect is more sensitive to the initial clustering center selection; Alsobecause of the randomness of initial clustering centers to be selected, the clusteringeffect is not stable, as well as the data sample space increasing, it will spend more andmore time and so on. So as for that, this article proposed based on densityperpendicular midpoint as the initial clustering center to divide, through calculate thedensity values of data sample points, reflect the general distribution of data samples;Meanwhile, the paper using the vertical midpoint of current highest density point andwhose nearest point as the initial clustering center, which has better representation.(2) In the light of the traditional K-means clustering methods needed to setclustering number manually, it have higher empirical requirements for the researchers,and the clustering effect is also affected by the clustering number. Therefore, on thebasis of the equilibrium of evaluation function, the paper around the essence ofclustering method that â€œlow similarity between the two class, high similarity in the same classâ€ to improving the algorithm. At first, setting the clustering numberautomatically and iterative selection; Then, use the clustering evaluation functionimproved to evaluate the clustering effect until find the minimum value of the k,which is the optimal number of clustering.(3) The paper had the improved algorithm and the traditional K-means clusteringalgorithm compared of the experiments in the UCI data set. Through the powerfulexperiments, discovered the improved algorithm has obvious advantages comparedwith the traditional K-means algorithm, and also has higher accuracy and betterstability which compared to some of the improved clustering algorithm of theliterature.The experimental results show that: the K-means algorithm based on the density ofvertical midpoint, which overcome the shortcomings of the traditional K-meansclustering algorithm, has certain application value.

Keywords/Search Tags:

Clustering, K-means, density, vertical midpoint, evaluation function

Related items

1	The Research And Implementation Of Density-based Clustering Algorithm With Pattern Evaluation Methods
2	The Study And Development Of Hierarchical-K-means-Based Clustering Algorithm
3	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
4	Research On Fuzzy Clustering Analysis Algorithm Based On Density
5	The Application Of Improved Fuzzy C Means Clustering Algorithm In Image Segmentation
6	Research On Improvement Of K-means Clustering Algorithm
7	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
8	Trajectory Clustering Analysis Based On NMAST Density Function And Its Application
9	Research On Density Peak-based Clustering Algorithm And Its Parallel Implementation
10	Research On Fuzzy Clustering Algorithm Based On Density Function