Font Size: a A A

An Adaptive Clustering Algorithm For Uncertain Data Based On Interval Number

Posted on:2021-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:C H LiFull Text:PDF
GTID:2518306107977519Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hardware precision or human interference hampers the applications of uncertain data collected by sensor networks.Uncertain data cannot describe the state of things' attributes with exact values.As a result,the clustering algorithm for certain data unable to complete the clustering task of uncertain data.In recent years,uncertain data clustering algorithms have been proposed in succession,but they still face the following problems to be solved:(1)Operational efficiency.Uncertain data clustering algorithms perform a large number of meaningless instance distance calculations in the process of calculating uncertain object distance,which result in inefficient operation of the algorithm.(2)Clustering accuracy.Some uncertain data models and the related concepts in the clustering process destroy the integrity of uncertain data information,then resulting in the precision error of clustering objects,which reduces the clustering accuracy of algorithms.(3)Adaptability.Due to the advantages of the density clustering algorithm in discovering non-spherical clusters,then density-based clustering algorithms for uncertain data have been proposed.But the problem of non-adaptive threshold about the density clustering algorithm has not been solved well.The interval number model is commonly used in uncertainty decision analysis.And it describes the probability distribution of possible attribute values based on the upper and lower limits of uncertain data,which ensures the integrity of data information to the greatest extent.Therefore,aiming at the above problems,this thesis studies an adaptive clustering algorithm for uncertain data based on interval number.And the main work includes:(1)This thesis proposes a new uncertain data clustering algorithm IN-DBSCAN(DBSCAN algorithm based on Interval Number model).Firstly,IN-DBSCAN constructs the interval number model to describe the data distribution information of uncertain instances,which ensures the integrity of data information.Then IN-DBSCAN designs an efficient distance calculation strategy to calculate the distance between uncertain objects,which improves the operation efficiency of the algorithm.Finally,IN-DBSCAN redefines the related concepts of the classic density clustering algorithm DBSCAN(Density-Based Spatial Clustering of Applications with Noise),which realizes the density clustering of uncertain data.(2)This thesis proposes an improved adaptive clustering algorithm IN-DBSCANa(IN-DBSCAN adaptive algorithm)on the basis of IN-DBSCAN algorithm.IN-DBSCANa replaces the fixed probability threshold in the IN-DBSCAN algorithm with the maximum reachable probability,and then it proposes a density parameter adaptive strategy based on Gaussian-Means,which effectively avoids the influence of human factors on the clustering result.Eventually IN-DBSCANa realizes the automatic clustering.(3)This thesis tests the performance of the proposed algorithms and six uncertain data clustering algorithms including UK-Means,MMVar,FDBSCAN,FOPTICS,KKL,and REP on synthetic datasets,real benchmark datasets,and real world datasets.Experimental results show that the clustering accuracy and operational efficiency of the proposed algorithms are better than the existing uncertain data clustering algorithms,and they are more competitive.
Keywords/Search Tags:Uncertain Data, Clustering, Interval Number, Distance Calculation Strategy, Adaptive Threshold
PDF Full Text Request
Related items