Font Size: a A A

Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering

Posted on:2020-11-21Degree:MasterType:Thesis
Country:ChinaCandidate:Q J BuFull Text:PDF
GTID:2428330578955271Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Unsupervised learning can extract valuable information from massive data without sufficient prior knowledge.As a typical unsupervised algorithm,clustering algorithm has important application value in the fields of information retrieval,intrusion detection,pattern recognition and psychology.The clustering by fast search and find of density peaks(CFSFDP)algorithm is a new type of clustering algorithm with the advantages of a small number of parameters and the ability to process clusters of arbitrary shapes.However,CFSFDP has the following disadvantages:(1)The local density of the sample is susceptible to the cutoff distance;(2)The method of selecting the cluster center by artificial experience is easy to cause the misclassification of the cluster.In view of the above problems,this paper proposes an improved CFSFDP algorithm.The algorithm calculates the local density of the sample based on the distribution information of k neighbors,then finds the possible cluster centers of CFSFDP,and then uses the improved genetic k-means to realize the automatic selection of the optimal cluster center.The algorithm is superior to the original CFSFDP algorithm in the accuracy of clustering center selection,and can effectively processes data sets with large density peaks or large differences among clusters.Experiments were carried out on the UCI dataset,and the improved algorithm was compared with the experimental results of CFSFDP,GKA,and k-means.Finally,the improved algorithm was applied to text clustering,and the Sogou text corpus was used for experiments.The above experimental comparison results verified the clustering effectiveness of improved CFSFDP algorithm.This paper mainly made the following improvements:(1)Redefining the sample local density based on k neighbors.Reducing the sample reference range in density calculation to k,and introducing the distance mean of k neighbors into the calculation to avoid the influence of the cutoff distance on the local density of the sample.(2)Combining the genetic k-means to achieve automatic selection of cluster centers.Using the global search ability of the genetic k-means to automatically search for the optimal cluster center from the possible cluster centers obtained by CFSFDP to solve the problem of inappropriate selection of CFSFDP cluster center.(3)Combining population evolution algebra and convergence to propose adaptive crossover probability to avoid premature problem in genetic K-means iterative process.
Keywords/Search Tags:text clustering, CFSFDP, sample local density, cluster centers, genetic k-means
PDF Full Text Request
Related items