Font Size: a A A

The Research And Application Of Potential Clustering

Posted on:2019-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:X F YuFull Text:PDF
GTID:2428330548476043Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering algorithm is an important branch in data mining.It clusters data by some clustering criteria without any prior knowledge about potential data distribution.Nowadays,under the era of big data,the research needs of clustering algorithms become more and more urgent.In 2013,Yonggang Lu et al.proposed a novel potential-based hierarchical agglomerative(PHA)clustering method.The algorithm proposed a new similarity measure rule based on the analysis of the potential,which is faster in time,simpler in the process,and can get better clustering results in terms of accuracy.However,PHA algorithm has some shortcomings,mainly in: the need to set the number of clusters artificially;and in the mechanism of determining the category,due to the use of hierarchical clustering methods,only consider the role of distance,ignoring the effect of potential;in the processing of complex structure data set,the PHA algorithm cannot get the ideal clustering result;the noise points cannot be identified;and for a data set with noise,the clustering result is easily affected by the noise points,and so on.Obviously,these shortcomings affect the application effect of the PHA algorithm in the specific practice.Therefore,this paper improves the above mentioned shortcomings of the PHA algorithm and applies the improved algorithm to specific fields.The specific content of this paper includes the following aspects:(1)Aiming at the shortcoming that the PHA algorithm needs to input the number of clusters artificially and the defect in mechanism,an algorithm for automatically determine the cluster center based on potential is proposed.First,the potential of each point is calculated and the respective parent node is determined;then calculate the product of the potential of each point and its distance to the parent node,and determine the clustering center of the data set based on this product;finally,the remaining data points are assigned to the same cluster as their own parent node until all the data points are determined.Through theoretical proofs and experiments,the new algorithm not only retains the advantages of the PHA algorithm,but also can automatically determine the clustering center.At the same time,due to the improved sample distribution mechanism,better clustering results can be obtained.(2)Aiming at the shortcoming that PHA algorithm can not effectively deal with complex structured data sets and the clustering results are vulnerable to noise data,a hierarchical clustering algorithm based on potential in complex structure data sets with noise is proposed:First,construct the potential increasing curve,determine the noise data through the inflection point on the curve;Then,according to the potential,the normal point is stratified into the maximum layer and the minimum layer data,and make the two layers of data cluster automatically;Finally,hierarchical clustering the entire data set with the shortest distance measure based on the above two layers data points' clustering result until all the data point categories are determined.Through theoretical proofs and experiments,the new algorithm not only can effectively deal with complex structure data sets,but also that the clustering results are not affected by noise data and have better clustering results.(3)This paper applies the new algorithm and PHA algorithm to earthquake magnitudeclustering and air quality level clustering in key environmental cities.Firstly,the data was collected,and then clustering algorithms were used to cluster the seismic data and the data that reflected the urban air quality.Finally,analyzed the clustering effects of different algorithms through experiments,which proved the application value of the proposed algorithm.
Keywords/Search Tags:potential clustering, hierarchical clustering, cluster center, potential increasing curve, potential hierarchy, earthquake magnitude clustering, urban air quality assessment
PDF Full Text Request
Related items