Font Size: a A A

Algorithms Research Based On Multiple Hierarchical Clustering

Posted on:2020-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:W YanFull Text:PDF
GTID:2428330602451857Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the concept of big data,Internet + and cloud era has promoted data mining technology to all walks of life,and clustering,as an important method of data mining,has been widely applied in various fields.With the increase of data size,diversification and complexity of data types,traditional clustering algorithms are facing severe challenges.At present,most clustering algorithms have problems of high dependence on parameters and heavy computation,in view of these problems,a new efficient hierarchical clustering algorithm is proposed in this paper.Secondly,most of the existing clustering algorithms take the distance between samples as the similarity measure,which is inaccurate and greatly affected by noise points.To solve this problem,a similarity measurement method based on sample distribution is proposed in this paper,based on the similarity measure,this paper proposed a hybrid clustering algorithm based on density clustering and hierarchical clustering.Most contribution of this paper summered as followed:1.A new efficient hierarchical clustering algorithm is proposed.The algorithm consists of two stages: dividing and agglomerating.In the dividing stage,the initial data set is taken as a class,and more subclasses than the actual number of clustering are obtained through multiple dividing.In the agglomerating phase,the subclasses that were subdivided during the divide process are agglomerated into the correct classes.Aiming at the shortcomings of the large computation amount of most hierarchical clustering algorithms,a method for finding the best dividing position according to the sample distribution is proposed in the dividing stage.The method is accurate,efficient and avoids the repeated calculation of samples similarity matrix,thus greatly reducing the amount of computation.In the agglomerating stage,an agglomerating strategy with label detection is proposed,which can avoid unnecessary subclass agglomerate detection by adding dividing label and dividing level in the dividing stage.This strategy can greatly reduce the computation amount of the agglomerating stage and overcome the disadvantage that the intermediate results of the hierarchical clustering algorithm cannot be reconstructed.The algorithm in this chapter is accurate,efficient and requires no clustering parameters.2.A hybrid clustering algorithm based on density and hierarchical clustering is proposed.The algorithm consists of two clustering stages: density clustering in the first stage and hierarchical clustering in the second stage.In the density clustering stage,based on the CFSFDP algorithm,a method for automatically determining the clustering center is proposed.This method uses the change rate of product of sample density and distance as the index to automatically select the clustering center,which number is larger than the actual number of classes.This method overcomes the shortcoming of selecting clustering centers in CFSFDP algorithm.In the stage of hierarchical clustering,we agglomerate the subclasses which are divided in the density clustering stage.On this basis,we propose a similarity measure between subclasses based on sample distribution,this measure takes full account of the distribution of samples and adds the noise point processing part,which is more effective than other similarity measure.According to the characteristics of density clustering and hierarchical clustering,this algorithm effectively combines the two types of algorithms and proposes a hybrid clustering algorithm,which is more effective than the previously hybrid clustering algorithm.
Keywords/Search Tags:Hierarchical Clustering, Hybrid Clustering, Dividing Strategy, Agglomerating Strategy, Similarity Measure, Noise Point Processing
PDF Full Text Request
Related items