Font Size: a A A

Research And Improvement Of Uncertain Clustering Algorithm For Interval Valued Data

Posted on:2020-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:J W WangFull Text:PDF
GTID:2428330575994241Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of data mining and the wide application of uncertain data,more and more attention has been paid to the research of intrinsic information mining methods for uncertain data.Clustering is one of the classical algorithms in data mining,and how it can be used to uncertain data set has been an important research subject.Interval symbolic is an important representation of uncertain attribute.In this paper,based on the existing clustering algorithms for uncertain data we proposed new partition clustering EFCM-ID(efficient fuzzy c-mean for interval valued data)and density clustering ADBSCAN-ID(adaptive density based spatial clustering of applications with noise for interval valued data).In the clustering problems for uncertain interval data,points within the interval are usually assumed with uniform distribution resulting the interval hard to be accurately described.Based on quartile,MQ(median quartile-spacing)distance metric for general distributed interval data is designed to precise depict it.Furthermore,FCM clustering results is highly affected by the initial clustering centers and the update speed of membership degree is slow.To improve the efficiency of algorithm,EFCM-ID for general distributed interval data's clustering is proposed.We sampled from the whole dataset,then pick density centers as initial clustering centers and named the method as SDCS(sampling-based density center selection).To reduce the running time,a new measure founded on competitive learning theory for membership degree's update is devised.It accelerates update speed in different degree according to the membership degree's value.In the DBSCAN process,value of Eps and MinPts which need to be set manually have a great effect on the clustering result.NDEM(neighborhood differential expansion method),which fully consider the spatial distribution of samples in density expansion,is designed to solve the problem.Therefore,ADBSCAN-ID is proposed.Experiments on the simulated uncertain datasets verify the validity of MQ distance.At the same time,experiments on EFCM-ID and ADBSCAN-ID are carried out.The results show that EFCM-ID and ADBSCAN-ID clustering algorithms can get better results in the uncertain clustering algorithm.
Keywords/Search Tags:uncertain cluster algorithm, interval data, FCM, density based, competitive learning, DBSCAN, neighborhood differential
PDF Full Text Request
Related items