Font Size: a A A

Research On Clustering Algorithm Based On Automatic Determination Of Class Number Technology

Posted on:2022-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y WeiFull Text:PDF
GTID:2518306605973169Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The process of dividing a collection of physical or abstract objects into classes composed of similar objects is called clustering.Clustering algorithm is an efficient unsupervised learning method,which is widely used in many fields,including machine learning,data mining,pattern recognition,image analysis and bioinformatics,etc.However,there are still some difficulties in the existing clustering algorithms,such as uneven density among clusters,diverse clustering forms and identification of clustering centers in complex data sets.Aiming at tackling these problems,this thesis studies the density-peak clustering algorithm and improves the existing density-peak clustering algorithms.Specific research works are as follows:(1)Aiming at handling the problem that improper selection of clustering center points and number of clusters with manual participation leads to unsatisfactory clustering results,a clustering algorithm was proposed to automatically determine the correct clustering center points and number of clustering centers.The main idea is to combine peak density clustering algorithm with genetic algorithm.First of all,make initial clustering on the data set through the peak density clustering algorithm,use the density peak decision graph to select the more clustering centers than the correct number of centers as initial candidate clustering centers,and use the genetic algorithm to select the final proper number of clustering centers from the candidate clustering centers by introducing the clustering evaluation index as the fitness function.In this way,choosing improper number of cluster centers manually is avoided.After a large number of experiments,the final results show that,automatically determining the clustering centers by avoiding manually selecting the cluster centers not only can reduce the degree of human intervention,greatly improve the performance of the clustering algorithm,and can efficiently identify the correct cluster number and the centers of the clusters.Experimental comparison between the proposed algorithm and other typical mainstream clustering algorithms shows that the proposed clustering algorithm performs better.(2)Aiming at tackling the problem that existing clustering algorithms tend to lose sparse clusters when clusters on data sets are uneven distributed,which leads to poor clustering results,a clustering algorithm for data sets whose clusters are with uneven density distribution is proposed.The main reason for the poor clustering results of existing clustering algorithms on data sets with uneven density distribution is that data points in sparse areas are treated as noise,which leads to the loss of sparse clusters.Therefore,it is very necessary to design scheme to identify the points in the sparse area so that these points will not be treated as noise points.To do so,the proposed clustering algorithm improves the local density function in the peak density clustering algorithm by increasing the values of the local density function at the sparse regional centers,so that in the centers of the clusters in the sparse region on decision diagram can also be identified,instead of being regarded as noise.The experimental results show that the new clustering algorithm can identify the clustering centers of sparse clusters on multiple test data sets,and the final clustering results are more accurate.On some data sets,the experimental comparison between this algorithm and other typical mainstream clustering algorithms shows that the proposed clustering algorithm can obtain better results.
Keywords/Search Tags:Density peak clustering, genetic algorithm, clustering evaluation index, local density
PDF Full Text Request
Related items