Font Size: a A A

Research On Density Peaks Clustering Algorithm Based On Adaptive Partitioning

Posted on:2021-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:C X HongFull Text:PDF
GTID:2428330623467332Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,A variety of popular data applications and sophisticated database technologies have led to an explosive growth in the amount of data generated by human society.In order to extract useful information,various data mining methods are constantly being proposed.As an important part of data mining technology,cluster analysis has the largest proportion in unsupervised learning tasks,and has been widely used in pattern recognition,image processing,machine learning,web search,marketing and other fields.The density peaks clustering(DPC)algorithm is a typical density-based clustering algorithm.Compared with the traditional density-based clustering method,the DPC algorithm is novel,concise and efficient,and has the advantages of discovering clusters of arbitrary shapes.However,with the advent of the era of big data,various large-scale data are constantly being generated,and the requirements for clustering algorithms are becoming higher and higher.By analyzing the clustering process of DPC algorithm in detail,this paper finds that the algorithm needs to calculate the distance between all data points when processing data,which leads to high computational complexity.Especially on large-scale data,DPC algorithm is inefficient.In order to improve the DPC algorithm and enable it to process large-scale data efficiently,the main contributions are as follows:1.A density peak clustering algorithm based on improved grid(G-DPC)is proposed.Aiming at the inefficiency of DPC algorithm in processing large-scale data sets,this paper presents an idea of combining grid clustering algorithms.Dividing data objects into grids to increases the speed of the algorithm and avoids PC memory overflow when processing large data sets.what's more,the traditional grid-based clustering algorithm needs to specify the number of grids or the size of the grid in advance,these parameters are usually not well defined and different values have a greater impact on the clustering results.In order to solve the problem of parameter value,this paper proposes a method of grid size adaptive generation based on some characteristics of the data set itself.2.The calculation method of local density and relative distance is improved.Since local density and relative distance are two important factors that determine the clustering quality of DPC algorithm,when calculate the local density,in order to consider the influence of surrounding data objects on grid cells better,the concept of "impact function" is introduced,In addition,in order to objectively reflect the distribution of data,a new way of calculating the relative distance between grids is defined when calculating the relative distance between grids.The experimental results show that the improved density clustering algorithm of the grid greatly improves the efficiency of the DPC algorithm,and is superior to the traditional grid-based density peak clustering algorithm in clustering quality.3.A density peaks clustering algorithm based on circular partition(C-DPC)is proposed.Due to G-DPC algorithm increases the number of grids exponentially when processing high-dimensional datasets,which may result in the number of grids being larger than the number of sampling points and lead to the data set become sparse,and at this time,G-DPC algorithm does not improve the efficiency of the DPC algorithm.Based on this,this paper proposes the C-DPC algorithm.First,the data space is divided into circular regions with intersections,and then the data sets are clustered to obtain clustering results.The proposed algorithm is tested on several open artificial data sets and UCI real data sets,and compared with DPC algorithm,the validity and superiority of C-DPC algorithm are verified.It also makes up for the inefficiency of the G-DPC algorithm on high-dimensional data sets.
Keywords/Search Tags:clustering algorithm, density peaks, adaptive partition, grid partition, circular partition
PDF Full Text Request
Related items