Font Size: a A A

Research And Application On Clustering By Fast Search And Find Of Density Peaks

Posted on:2020-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:C T ChenFull Text:PDF
GTID:2428330596468138Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is a kind of unsupervised machine learning.Without prior knowledge,the data are divided into different sets according to the similarity between the data,which is called cluster.Clustering is widely used in many fields such as natural science,mechanical engineering,biomedicine,etc.Therefore,it is of great significance to obtain high-quality clustering algorithms for both academic and production.According to data states,clustering algorithms can be divided into classical clustering algorithm for static data and data stream clustering algorithms for data stream.Clustering by fast search and find of density peaks algorithm-DP,is a new clustering algorithm based on local density and distance.This algorithm has the advantages of being able to find clusters of arbitrary shapes,simple and easy to understand,few parameters and being able to partition data efficiently.But DP algorithm cannot deal with the shortcomings of multiple density peaks in a single cluster and unstable data partition.At the same time,when the differences between clusters is big enough,sparse and small clusters cannot be accurately identified.Therefore,this paper focuses on improving the effectiveness and applicability of DP algorithm.Improved algorithms are proposed for static data and stream data respectively.The main work of this paper is as follows:1.For static data,influence space based robust fast search and density peak clustering algorithm is proposed,I-DP.This algorithm introduces the influence space,develops a new data partitioning strategy,and adopts this partitioning strategy for data which have high local density to improve the stability of algorithm partitioning.A new local density computational formula is proposed.And weighted local density is calculated by neighboring data to improve the recognition ability for sparse and small clusters.2.For data stream,according to the features of data stream and based on EDA framework,density peak clustering based on empirical data analysis over data stream is proposed,EDA-DP.This algorithm uses EDA framework to capture the data stream in real time to generate micro clusters without pre-clustering,and dynamically adjust the statistical information of each micro cluster.When receiving the clustering request,EDA-DP algorithm executes the improved DP algorithm to generate a decision diagram,and selects the central micro cluster to obtain the final data partition.3.Finally,the application of classical clustering algorithms in text analysis are studied.The K-means,DP,and I-DP algorithms are combined with the VSM,LSI,and LDA text models,according to the similarity between texts,to cluster Chinese and English texts.Experimental results show that the I-DP and EDA-DP algorithms proposed in this paper have achieved better results in various indexs.In text analysis,the I-DP algorithm improves the F1 index by 9% compared to DP algorithm.
Keywords/Search Tags:DP algorithm, influence space, data stream, EDA framework, text models
PDF Full Text Request
Related items