Font Size: a A A

Research On Clustering By Fast Search And Find Of Density Peaks Algorithm Base On K Nearest Neighbor Approach

Posted on:2019-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:P L JiaFull Text:PDF
GTID:2428330578972750Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence(AI)theory and technology,data mining and machine learning,as one of the important fields in AI,is constantly influencing and changing people's work and life.So far,machine learning has been applied in many fields,Such as prediction,quantitative trading,text or voice information processing,image recognition and automatic driving,personalized recommendation,and so on.These technologies will undoubtedly make our life more convenient.Clustering analysis is an important application technology in the above applications.Clustering by fast search and find of density peaks(DPC)is a novel clustering algorithm proposed in 2014.It uses the distribution density of samples and the distance between different samples to cluster.The process of DPC is simple and efficient.However,the traditional DPC algorithm has some disadvantages in identifying different density clusters,finding clusters with arbitrary shapes and removing noisy data.In order to solve the above problems,two improved DPC algorithms are proposed based on the idea of K nearest neighbor approaches.The main work of this dissertation is as follows:(1)In view of the inefficiency of traditional DPC algorithm that cannot effectively find different clusters with different densities,clustering by fast search and find of density peaks based on K nearest neighbor graph algorithm(KG-DPC)is proposed.KG-DPC defines a new density estimation function by calculating the distance set between K nearest neighbor set and K nearest neighbor.The sample density is estimated by the K nearest neighbor set.Then the sample distances are calculated through selecting the cluster centers by decision diagram.Finally,we use those sample distances to quadratic clustering.The experimental results show that the KG-DPC algorithm is better than the DPC algorithm in distinguishing the different density clusters.(2)Aiming at the disadvantages of DPC algorithm that cannot identify noisy data and find the clusters with arbitrary shapes,clustering by fast search and find of density peaks based on noise removal algorithm(NR-DPC)is proposed.First,the K nearest neighbor distance set is used to estimate the noise indicators.The data set is stratified by noise indicators and select the samples with high densities.Then the noise indicators are used to estimate the sample densities and calculate the sample distances.The decision graph is used to select cluster centers.Finally,the DBSCAN algorithm is used to expand clusters with the highest densities of clustering centers.A great deal of experiments show that NR-DPC algorithm is superior to DPC algorithm in most metrics.The clustering results on text test data also show that the accuracy of the NR-DPC algorithm is improved obviously.
Keywords/Search Tags:DPC algorithm, K nearest neighbor approach, noisy data, Quadratic Clustering
PDF Full Text Request
Related items