Font Size: a A A

Research On Partition-based Clustering Of Uncertain Data

Posted on:2019-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:D K TangFull Text:PDF
GTID:2348330566959017Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the fields of sensor networks,radio frequency identification,financial services and other fields,it often brings uncertainty to the original data because of network delay,sensor noise and the protection of user data privacy.How to use it reasonably and effectively,rather than simply discarding it,is a crucial issue for the analysis of uncertainty.Clustering of uncertain data is one of the research hotspots in uncertain data analysis.Different from the certain data,an uncertain data object is no longer a single sample point,but consists of some points with the same probability distribution.The clustering algorithm of uncertain data is mostly based on the certain data clustering algorithm,which is expanded by using different similarity measures.The expected distance is the most used similarity measure.However,for uncertain data with similar probability distribution,because the uncertain data objects have similar distribution,there will be overlapping of data objects.At this time,the geometric distance like expectation distance can not be distinguished effectively.For this kind of uncertain data,this paper uses KL-divergence as a similarity measure formula,and based on fuzzy C-means algorithm,proposes an uncertain clustering algorithm UFCM-KL.In addition,in order to overcome the UFCM-KL algorithm's sensitivity to the initial center point,this paper also uses the idea of density clustering to improve the UFCM-KL algorithm.The improvement points of this algorithm are as follows:(1)expand the fuzzy C means algorithm,so that it can cluster uncertain data.(2)The KL-divergence is used as the similarity measure formula instead of the expected distance,and the asymmetry of the KL-divergence is improved and smoothed.(3)The UFCM-KL algorithm is sensitive to the initial value and is easy to fall into the local optimization.A new method of initial value selection is proposed,which can make the objective function minimum by selecting the uncertain object with larger density and relatively distant relative distance.In this paper,we compare the five algorithms of UK-means,UK-medoids,UK-medoids-KL,UFCM-KL and improved UFCM-KL.First,in order to prove the effectiveness of the proposed algorithm,the above five algorithms are used to cluster the UCI datasets Iris,Wine,and Glass.The F1 value of the clustering results proves that the proposed algorithm is effective.Secondly,in order to verify the efficiency of the algorithm,the artificial synthetic uncertain data are used to cluster,and the clustering time of the five algorithms is compared.The efficiency of the UFCM-KL algorithm is the highest.Finally,the effect of the parameters on the clustering results is verified,on the synthetic data,the influence of the parameters on the accuracy and recall rate of the five algorithms was compared.The experimental results show that the UFCM-KL algorithm and the improved UFCM-KL algorithm are effective,and compared with the UK-means,UK-medoids and UK-medoids-KL algorithms,The algorithm in this paper has good clustering quality in both the operational efficiency and the applicability of the parameters.
Keywords/Search Tags:Uncertain data, Partition clustering, Similar probability distribution, KL-divergence, UFCM-KL algorithm
PDF Full Text Request
Related items