Font Size: a A A

Optimization Research Based On Density Peak Clustering Algorithm

Posted on:2024-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:T A HeFull Text:PDF
GTID:2568306932459664Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,the era of big data is coming quietly.Whether in work,study or in life,there are all kinds of complicated data information.How to mine and obtain valuable information from these data information has become a hot issue of concern to people.As a technology of data analysis and data mining,clustering is also an unsupervised machine learning method,which is often used in various fields,including object recognition,natural language processing,image retrieval,etc.The Density peak clustering algorithm is a relatively advanced algorithm based on density clustering.Compared with other clustering algorithms,this algorithm has the advantages of simple and efficient,fewer parameters and can identify arbitrary clusters.After a comprehensive study and analysis of the density peak clustering algorithm,this paper proposes a new clustering algorithm aiming at its shortcomings.The main work is as follows:This paper first describes the relevant background and significance of the topic,then briefly summarizes the current research status and existing problems,and introduces the theoretical basis.Aiming at the problems of fuzzy clustering center and wrong classification of data points in processing high-dimensional complex data by density peak clustering algorithm,the t-SNE algorithm was introduced and a density peak clustering algorithm based on t-SNE dimension reduction was proposed.Firstly,the t-SNE algorithm is used to preprocess high-dimensional data: the measurement method between data is represented by probability distribution,that is,the relationship between high-dimensional data points is mapped to the low-dimensional space with probability distribution,so that the information carried in the high-dimensional space is kept in the low-dimensional space.Then the essential features of the data are maximized by minimizing relative entropy,and the density peak clustering algorithm is used to cluster the data after dimensionality reduction.The product of local density and relative distance is used to update the selection of cluster center.The experimental results show that the new algorithm can cluster high-dimensional complex data efficiently,effectively reduce the redundancy of data,improve the clustering performance,and demonstrate the feasibility of the algorithm.Aiming at the problems that the traditional distance measurement method can not reflect the data distribution well in density peak clustering algorithm,and the subjective selection of truncated distance parameter is strong,an improved density peak clustering algorithm based on the sparrow search algorithm was designed.The algorithm is improved from two aspects:change the measurement method of distance between data,replace the Euclidean distance with standard Euclidean distance in the original algorithm,this processing method can better reflect the distribution of data,and can reflect the characteristics of data;Using the strong global optimization ability of the sparrow search algorithm,the NMI index was used as the fitness value objective function,the value range was set,and the truncation distance was optimized.Through the experimental verification on the artificial data set and the real data set,it can be concluded that the new algorithm can effectively reflect the data distribution,can automatically determine the truncation distance parameter,and has a good clustering effect on the arbitrary shape of the cluster,the clustering quality and clustering efficiency has been significantly improved.
Keywords/Search Tags:clustering, peak density, cutoff distance, standard euclidean distance
PDF Full Text Request
Related items