Font Size: a A A

A Clustering Method Based On Sticky Hierarchical Dirichlet Process And Its Application

Posted on:2020-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2428330599954372Subject:Statistics
Abstract/Summary:PDF Full Text Request
When clustering analysis is performed on multi-source data,the increasing data dimension,the difference of information between each index and the different distribution of each index follows will enhance the complexity of clustering and even affect the accuracy of the clustering results.The existing clustering methods assume that the indices are independent,regardless of the difference of information between each index.Therefore,we need to propose a new clustering method to overcome these problems.Based on the existing research,this paper proposes a new method called improved sticky hierarchical Dirichlet process to cluster multi-source data and solve the problem of varying information of each index,and uses sticky parameters to reflect the correlation between each index and the overall clustering.The validity of this method is verified through clustering analysis results of simulated data and IRIS dataset.We find that the greater the correlation of the index,the higher the accuracy of the overall clustering result;when there are some indices with higher and lower correlation,the index with lower correlation has little effect on the accuracy of the clustering result of the sticky hierarchical Dirichlet process,showing that this method has robustness.At the same time,we also find that the correlation between the index and the overall clustering can be directly reflected by the sticky parameter.In addition,by studying the relationship between the number of indices and the overall clustering accuracy under different sticky conditions,we also find that the overall clustering result can be reflected by the results of clustering analysis of a few indices with larger sticky parameter,so that the important indices can be filtered according to the sticky parameters,and the dimension reduction clustering of the data can be realized to achieve the effect of simplifying the model.When performing clustering analysis on simulated data and IRIS dataset,we apply the sticky hierarchical Dirichlet process method,and find it has a higher accuracy than other clustering methods: the longest distance method,K-means method,PAM algorithm,Principal Component Analysis method and the mixture model method.Especially when the data has a few indices with greater correlation,the sticky hierarchical Dirichlet process method is obviously superior to other clustering methods,showing that it can significantly improve the accuracy of classification and has robustness.Finally,we perform clustering analysis on the 2016 data of agricultural modernization in 85 prefecture-level cities of Guangdong,Henan,Anhui,Jiangsu and Sichuan province.The clustering result shows that the rationality of classification of agricultural modernization development level for these prefecture-level cities in the sticky hierarchical Dirichlet process method is significantly higher than other methods,showing that this method can accurately judge the level of agricultural modernization development of these prefecture-level cities.At the same time,we also find it is objective and accurate to reflect the importance of each index in the classification of agricultural modernization development level with sticky parameters,showing that we can accurately identify the important factors that affect the development of agricultural modernization with sticky parameters.
Keywords/Search Tags:Multi-source Data, Clustering Analysis, Sticky Hierarchical Dirichlet Process, Correlation of Index, Agricultural Modernization
PDF Full Text Request
Related items