Font Size: a A A

The Research And Application Of The K-means Clustering Algorithm Based On Influence Space

Posted on:2017-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:W C ZhaoFull Text:PDF
GTID:2348330509452866Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering analysis, as a very important data mining technology, has been used in machine learning, pattern recognition, image processing and many other fields by more and more researchers. Though K-means is one classic clustering algorithm and even has been popularly and widely used in the life and production practice, there still exist several shortcomings, such as sensitivity to noise data and initial center data, the high clustering time for the distance calculation, and so on. This paper first tries to solve these problems mentioned and brings up series of improvement strategies, then applies the improved clustering algorithm to the celestial spectrum data analysis. The main research contentes are as follows:(1) In view that the traditional K-means clustering algorithm is sensitive to initial center and noise data, this paper gives an initial optimization algorithm of K-means based on the influence space. The algorithm uses influence space, a simple data structure, to partition the data set to some small regions, and searches a representing data object which could represent the correlation region.Then these repreenting data objects would be clustered by the weighted distance attract factor to obtain the initial center data objects. These choosed initial center data objects mainly distribute in the local region with largest density, and would effectively reduce the intervention of the noise data to the clustering results. The results show that the proposed algorithm has a greater advantage in clustering precision and clustering iterative times.(2) In view of the high time for calculating distance in each iteration process,this paper proposes a fast K-means algorithm. The new algorithm would run the K-means algorithm on the representing data set that algorithm(1) has mentioned until criterion function convergence. The class of a represent data object is also the class of all data points in the correlation region. The proposed algorithm would improve the clustering efficiency by decreased the amount of clustering process.According to above studies, it can be conclusion that the proposed algorithm is more effective.(3) On the basis of the above studies, the clustering algorithms would be used to analyze the LAMOST spectral data. In experiments, a clustering analysis prototype system for the spectrum data is designed, and then the functions and key technologies related to the system are introduced. This system could provide an effective way to analyze and explore the helpful information hiding in the spectrum data.
Keywords/Search Tags:K-means, Influence space, Initial optimization, ISBFK-means, Spectrum data
PDF Full Text Request
Related items