Research On Clustering Algorithm Based On Density Peak And Its Application In Text Clustering

Posted on:2020-11-21

Degree:Master

Type:Thesis

Country:China

Candidate:Q J Bu

Full Text:PDF

GTID:2428330578955271

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Unsupervised learning can extract valuable information from massive data without sufficient prior knowledge.As a typical unsupervised algorithm,clustering algorithm has important application value in the fields of information retrieval,intrusion detection,pattern recognition and psychology.The clustering by fast search and find of density peaks(CFSFDP)algorithm is a new type of clustering algorithm with the advantages of a small number of parameters and the ability to process clusters of arbitrary shapes.However,CFSFDP has the following disadvantages:(1)The local density of the sample is susceptible to the cutoff distance;(2)The method of selecting the cluster center by artificial experience is easy to cause the misclassification of the cluster.In view of the above problems,this paper proposes an improved CFSFDP algorithm.The algorithm calculates the local density of the sample based on the distribution information of k neighbors,then finds the possible cluster centers of CFSFDP,and then uses the improved genetic k-means to realize the automatic selection of the optimal cluster center.The algorithm is superior to the original CFSFDP algorithm in the accuracy of clustering center selection,and can effectively processes data sets with large density peaks or large differences among clusters.Experiments were carried out on the UCI dataset,and the improved algorithm was compared with the experimental results of CFSFDP,GKA,and k-means.Finally,the improved algorithm was applied to text clustering,and the Sogou text corpus was used for experiments.The above experimental comparison results verified the clustering effectiveness of improved CFSFDP algorithm.This paper mainly made the following improvements:(1)Redefining the sample local density based on k neighbors.Reducing the sample reference range in density calculation to k,and introducing the distance mean of k neighbors into the calculation to avoid the influence of the cutoff distance on the local density of the sample.(2)Combining the genetic k-means to achieve automatic selection of cluster centers.Using the global search ability of the genetic k-means to automatically search for the optimal cluster center from the possible cluster centers obtained by CFSFDP to solve the problem of inappropriate selection of CFSFDP cluster center.(3)Combining population evolution algebra and convergence to propose adaptive crossover probability to avoid premature problem in genetic K-means iterative process.

Keywords/Search Tags:

text clustering, CFSFDP, sample local density, cluster centers, genetic k-means

PDF Full Text Request

Related items

1	Research On The Selection Of Initial Cluster Centers In K-means Algorithm
2	The Research And Application Of Text Clustering Based On Improved K-means Algorithm
3	Study On Problems To Select Initial Cluster Centers Of The K-means Algorithm
4	Research On Improved Density Peak Clustering Algorithm
5	Research And Application Of DNA Genetic Algorithm Based On P System In Cluster Analysis
6	Improved K-means Clustering Based On Genetic Algorithm
7	Research On Density Cluster Centers Constrained Hierarchical Clustering
8	Partition-based Clustering Research And Its Application In Web Mining
9	Reasearch On The Telecommunication Complaint Text Clustering Based On Improved CFSFDP Algorithm
10	Text Clustering And Its Application Based On CFSFDP Algorithm