Font Size: a A A

Density-based Uncertain Data Clustering Algorithm

Posted on:2020-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:M J ZhaoFull Text:PDF
GTID:2428330590472545Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of the network level,the information science has entered the era of big data.A large amount of data information has emerged in various industries,and the data mining problem of uncertainty data has become a hot spot for everyone.How to select effective methods to mine the potential value of these uncertain data has been a research focus in data mining field.Cluster analysis technology has important significance for the processing of uncertainty data.It is an indispensable part of data mining.The density-based clustering algorithm has the advantages of identifying clusters of arbitrary shapes and being insensitive to noise.Its representative algorithms include DBSCAN,NBC algorithm.However,traditional density-based clustering algorithms cannot be directly applied to uncertainty data.In this paper,the generation of uncertainty data,typical uncertain data clustering algorithm and traditional clustering knowledge are deeply explored and studied.On this basis,it is proposed to adopt a reasonable and effective way to improve the NBC algorithm so that it can process uncertainty data.This paper mainly includes the following two parts:1.Because existing uncertainty clustering algorithms have many problems in measuring the similarity between uncertain objects,such as the inability to fully consider the effects of uncertainty,high computational complexity,and clustering accuracy for data set with noise and overlap.Therefore,this paper first applies the contact number and situation value theory to DBSCAN and NBC algorithms,and proposes UDBSCAN and UNBC algorithms.At the same time,in order to improve the ability to cluster data sets with overlapping parts,the polynomial kernel distance is introduced into the UNBC algorithm,and the UNBCP algorithm is proposed.The algorithm has advantages in clustering accuracy and computational complexity.2.For the uncertain data clustering algorithm FDBSCAN-KL,it uses K-L divergence to measure the similarity between uncertain data objects,and the clustering precision is higher on the data sets with overlapping parts.However,due to the limitations of K-L divergence itself,the algorithm is not good at dealing with completely separated uncertain data objects.Therefore,this paper proposes to use J-S divergence to measure the similarity between uncertain data objects,and apply its ideas to the NBC algorithm to propose the NBC-JS algorithm.Experiments show that the proposed algorithm not only preserves the superiority of the FDBSCAN-KL algorithm,but also efficiently clusteringcompletely separated uncertain data sets.The algorithm has the advantages of low sensitivity and low computational complexity.
Keywords/Search Tags:density, clustering analysis, uncertainty data, NBC algorithm, contact number, K-L divergence, J-S divergence
PDF Full Text Request
Related items