The Research And Application Of The K-means Clustering Algorithm Based On Influence Space

Posted on:2017-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:W C Zhao

Full Text:PDF

GTID:2348330509452866

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis, as a very important data mining technology, has been used in machine learning, pattern recognition, image processing and many other fields by more and more researchers. Though K-means is one classic clustering algorithm and even has been popularly and widely used in the life and production practice, there still exist several shortcomings, such as sensitivity to noise data and initial center data, the high clustering time for the distance calculation, and so on. This paper first tries to solve these problems mentioned and brings up series of improvement strategies, then applies the improved clustering algorithm to the celestial spectrum data analysis. The main research contentes are as follows:(1) In view that the traditional K-means clustering algorithm is sensitive to initial center and noise data, this paper gives an initial optimization algorithm of K-means based on the influence space. The algorithm uses influence space, a simple data structure, to partition the data set to some small regions, and searches a representing data object which could represent the correlation region.Then these repreenting data objects would be clustered by the weighted distance attract factor to obtain the initial center data objects. These choosed initial center data objects mainly distribute in the local region with largest density, and would effectively reduce the intervention of the noise data to the clustering results. The results show that the proposed algorithm has a greater advantage in clustering precision and clustering iterative times.(2) In view of the high time for calculating distance in each iteration process,this paper proposes a fast K-means algorithm. The new algorithm would run the K-means algorithm on the representing data set that algorithm(1) has mentioned until criterion function convergence. The class of a represent data object is also the class of all data points in the correlation region. The proposed algorithm would improve the clustering efficiency by decreased the amount of clustering process.According to above studies, it can be conclusion that the proposed algorithm is more effective.(3) On the basis of the above studies, the clustering algorithms would be used to analyze the LAMOST spectral data. In experiments, a clustering analysis prototype system for the spectrum data is designed, and then the functions and key technologies related to the system are introduced. This system could provide an effective way to analyze and explore the helpful information hiding in the spectrum data.

Keywords/Search Tags:

K-means, Influence space, Initial optimization, ISBFK-means, Spectrum data

PDF Full Text Request

Related items

1	The Research Of The K-means Clustering Algorithm Based On Nearest Neighbors
2	Research On K-means Optimization Clustering Algorithm
3	Research On Optimization And Parallel Of K-means Algorithm On Spark
4	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
5	Improvement Of K-means Algorithm And Its Application In The Text Data Cluster
6	Research And Application Of K-means Clustering Algorithm
7	Based On The Selection Of The Initial Point Of K-means Clustering Algorithm And Its Application
8	Improvement And Application Of K-means Algorithm
9	Improvements And Implementation Of K-means Clustering Algorithm
10	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm