Density-based Uncertain Data Clustering Algorithm

Posted on:2020-07-24

Degree:Master

Type:Thesis

Country:China

Candidate:M J Zhao

Full Text:PDF

GTID:2428330590472545

Subject:Applied Mathematics

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of the network level,the information science has entered the era of big data.A large amount of data information has emerged in various industries,and the data mining problem of uncertainty data has become a hot spot for everyone.How to select effective methods to mine the potential value of these uncertain data has been a research focus in data mining field.Cluster analysis technology has important significance for the processing of uncertainty data.It is an indispensable part of data mining.The density-based clustering algorithm has the advantages of identifying clusters of arbitrary shapes and being insensitive to noise.Its representative algorithms include DBSCAN,NBC algorithm.However,traditional density-based clustering algorithms cannot be directly applied to uncertainty data.In this paper,the generation of uncertainty data,typical uncertain data clustering algorithm and traditional clustering knowledge are deeply explored and studied.On this basis,it is proposed to adopt a reasonable and effective way to improve the NBC algorithm so that it can process uncertainty data.This paper mainly includes the following two parts:1.Because existing uncertainty clustering algorithms have many problems in measuring the similarity between uncertain objects,such as the inability to fully consider the effects of uncertainty,high computational complexity,and clustering accuracy for data set with noise and overlap.Therefore,this paper first applies the contact number and situation value theory to DBSCAN and NBC algorithms,and proposes UDBSCAN and UNBC algorithms.At the same time,in order to improve the ability to cluster data sets with overlapping parts,the polynomial kernel distance is introduced into the UNBC algorithm,and the UNBCP algorithm is proposed.The algorithm has advantages in clustering accuracy and computational complexity.2.For the uncertain data clustering algorithm FDBSCAN-KL,it uses K-L divergence to measure the similarity between uncertain data objects,and the clustering precision is higher on the data sets with overlapping parts.However,due to the limitations of K-L divergence itself,the algorithm is not good at dealing with completely separated uncertain data objects.Therefore,this paper proposes to use J-S divergence to measure the similarity between uncertain data objects,and apply its ideas to the NBC algorithm to propose the NBC-JS algorithm.Experiments show that the proposed algorithm not only preserves the superiority of the FDBSCAN-KL algorithm,but also efficiently clusteringcompletely separated uncertain data sets.The algorithm has the advantages of low sensitivity and low computational complexity.

Keywords/Search Tags:

density, clustering analysis, uncertainty data, NBC algorithm, contact number, K-L divergence, J-S divergence

PDF Full Text Request

Related items

1	Studies On Semi-supervised Clustering Algorithms Based On Entropy And Divergence
2	A K-anonymity Algorithm Based On Jensen-Shannon Divergence
3	Military Strong Laser Beam Parameters Test Research Of Divergence Angle And Energy Density
4	Research On Clustering Algorithm Based On Cluster Center Selection
5	Research On Collaborative Filtering Recommendation Algorithm Based On KL Divergence
6	Study On The Spatiotemporal Characteristics Of Human Convergence And Divergence Based On Massive Mobile Phone Location Data
7	Research On Control Divergence Optimization Of GPGPU
8	Image Cosegmentation Method Based On Minimum Fuzzy Divergence
9	New Method On Information Divergence In Medical Tomographic Imaging
10	Statistical Complexity Measure Analysis Of Gait Signal Based On LMCD And JSD