Font Size: a A A

Improving Of Clustering Algorithm And Research On Clustering Validity Index

Posted on:2022-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:D X CaoFull Text:PDF
GTID:2518306557969909Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The advancement of information technology has given birth to data mining technology,and cluster analysis is a key technology in data mining.Cluster analysis is an unsupervised learning technique,which aims to classify unlabeled data sets without using external prior information.Clustering algorithm is the main body of cluster analysis.K-Means algorithm is widely used in the field of cluster analysis due to its simple implementation principle and low time complexity.However,its value needs to be preset and the initial cluster center is randomly selected.Selection is easy to fall into shortcomings such as local optimal solutions.The density peak clustering algorithm is a new type of clustering algorithm,which has the characteristics of simple principle and high efficiency,which has attracted widespread attention in the academic community.However,the algorithm has the following shortcomings:(1)The cutoff distance is selected empirically;(2)The cluster center point is selected subjectively;(3)The remaining data sample points are allocated in one step.The cluster validity index is an effective means to evaluate the clustering results.It aims to use the internal or external information of the clustering results to evaluate them,so as to obtain better clustering results.There are many clustering validity indicators,but most of them have shortcomings,such as poor stability and narrow application range.Aiming at the above-mentioned shortcomings,this thesis has conducted in-depth research on related algorithms and indicators,and proposed corresponding improvement methods and measures.The main work content and research results are as follows:(1)An adaptive clustering algorithm based on the median maximum distance and SSE is proposed.The main purpose of this algorithm is to improve the shortcomings of the traditional KMeans algorithm.The change trend of the SSE value during the clustering operation is used to decide whether to continue or terminate the operation,so as to automatically determine the K value;and use the maximum distance median method to calculate and obtain more accurate initial cluster center point.Through experimental comparison,the experimental results show that the algorithm has higher accuracy and better stability when the number of clusters is accurately obtained.(2)An adaptive density peak clustering algorithm(KNN-ADPC)based on K nearest neighbors is proposed.Firstly,this algorithm is inspired by the K-nearest neighbor algorithm,and the local density of the current sample point is determined by the information of the nearest neighbor sample point of the data sample points;secondly,the maximum and minimum distance method is introduced to determine the number of clusters in the data set,so as to determine the clustering center;finally,the remaining sample points are used to complete the clustering operation using a two-step allocation strategy.Through experimental comparison with DPC algorithm,DBSCAN algorithm,AP algorithm,and K-Means algorithm,the results show that the KNN-ADPC algorithm achieves better index values and cluster quality.(3)A new clustering effectiveness index(CPI)combining clusters and sample points is proposed.By introducing the tightness and separation between clusters and the distance between the clusters of the sample points,the two parts are combined with the combination ratio coefficient,and the structure of the data set can be better recognized.The cluster validity index is compared through experiments.The experimental results show that the CPI index not only has better evaluation performance,but also has a wider application range and higher stability.
Keywords/Search Tags:Clustering algorithm, K-Means, peak density, K-nearest neighbor, cluster validity index
PDF Full Text Request
Related items