Some Problems Of Determining The Optimal Number Of Clusters In Clustering Analysis

Posted on:2014-06-26

Degree:Master

Type:Thesis

Country:China

Candidate:Y Song

Full Text:PDF

GTID:2268330401460512

Subject:Basic mathematics

Abstract/Summary:

Clustering is a distinguish and classification process according to the object or the object of study of similarity(diversity) between the individuals to on the research groups of individuals.In this course, there are not training sample set,so it is an unsupervised classification process.But clustering analysis is based on the degree of closeness between the research object without any hypothesis of data sets.Mathematics and statistics is used to study and treatment the classification of the given object.It is a multivariate statistical analysis method to determine the reasonable clustering number(the class number of the taxonomic group)."Birds of a feather flock together." The essence of clustering was vividly depicted through the above words.Although clustering is an old problem, it is accompanied by the progress and development of human society and the deepening.We must distinguish the different things and understand the things on similarity (or dissimilarity) to know the word.From a certain level, which is helpful in grasping the nature and characteristics of things.Clustering analysis is an important research content on data mining, pattern recognition and machine learning.As an important method of data analysis and understanding,the importance of the clustering analysis and other research direction of cross characteristics has obtained widespread affirmed from the scholars on all walks of life.Clustering analysis method has been widely used in various fields of social science and natural science,such as psychology, biology, medicine, communications and computer, etc.For a long time,A variety of different clustering analysis method is proposed on different needs and different problems,but K-means is the most classic method.Although later many dynamic clustering algorithm is proposed,K-means clustering was used as the basic model. In other words, we supposed the number of clustering was given beforehand,according to the intrinsic attribute of the waiting clustering samples,various research object will be divided into various classes by optimizing class center or membership.But the shortcomings of many such algorithm are the effective way to determine the clustering number were not given.In the discussions of the practical problems, a given data structure and the specific cluster number on the other information were not fully grasp(sometimes even know nothing about it),but there is a complex problem how to determine the objective and accurate clustering number.At the same time,the information about the cluster number must been provide beforehand in many kinds of clustering algorithms (including most of the dynamic clustering method),in order to implement algorithm running process.There is a problem the clustering results was more dependency on the initial cluster number on this algorithms,it will result in the clustering algorithm is likely to stay in the local optimal solution.So the reliability of the final clustering result is undefined.This paper has discussed the determination of the optimal clustering number in clustering analysis and the effectiveness of the clustering results.On the basis of the analysis and the induction on the traditional clustering algorithm and the clustering. The improved simulated annealing algorithm is applied into clustering analysis determine.And based on probability perturbation to overcome the local optimal solution of the dynamic clustering algorithm of determine the clustering number is proposed,that can determine the reasonable in the process of cluster number and cluster at achieving as close as possible. In this paper, the main research work and results can be summarized as following:(1) We had expounded research background and research significance,analyzed this topic research present situation and introduced the clustering analysis of the basic concepts and research methods.(2) K-means algorithm was introduced and some relevant method for determining the number of clustering was analyzed.(3) We proposed a algorithm to determine the optimal clustering number on the base of the improved simulated annealing algorithm,its feasibility and validity was verified through some case analysis in practical application.We put forward the outlook for the future research work in the last.

Keywords/Search Tags:

cluster analysis, K-means clustering algorithm, Simulated annealingalgorithm, the effective of the clustering results

Related items

1	Research And Improvement Of K - Means Clustering Algorithm
2	Research On Initialization Method Of Dividing Clustering Algorithm
3	Improvements And Implementation Of K-means Clustering Algorithm
4	Research Of Improved K-means Algorithm And New Cluster Validity Index In Cluster Analysis
5	Research And Comparison Of Several Kinds Of Clustering Algorithm For Image Segmentation
6	Improved Fuzzy C-Means Clustering Algorithm
7	Research On Improvement Of K-means Clustering Algorithm
8	Research On Clustering Algorithm Based On Evolutionary Thought And Cluster Fusion
9	Integration Of Research And Application Of The Algorithm Based On The Improvement Of The K-means Clustering
10	Research On Cluster Analysis Based On Optimized Genetic Algorithm