Font Size: a A A

A Preliminary Study Of Clustering Method For Uncertain Data

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:G Y WangFull Text:PDF
GTID:2428330620972191Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,the data presents massive and diverse,which brings great difficulties to data mining and clustering analysis.In addition,there are uncertainties in the real data,which increase the difficulty of obtaining valuable information.Therefore,in recent years,how to extract valuable information from uncertain data sets has become one of the research focuses.Uncertain data is mainly divided into existence level and attribute level.In order to better understand these two kinds of uncertain data,this paper mainly does the following work:First,in the third chapter,this paper proposes the uldc uncertain data clustering algorithm.This algorithm is learning the density based clustering algorithm for uncertain data,and finds that some algorithms have some shortcomings when clustering uncertain data.In view of these shortcomings,a clustering algorithm based on local density for uldc uncertain data is proposed.Firstly,we improve the measurement of similarity between uncertain data objects,then introduce the related concepts of uldc algorithm,such as local density,data chain,etc.,and finally describe the overall process of the algorithm.Compared with DBSCAN and other algorithms,the algorithm reduces the number of parameter values and the influence of parameters on clustering results.The experimental results show that the F1 values of the algorithm are 0.8876 and 0.9086 on the iris data set connect-4,respectively,which shows that the algorithm has good clustering quality.Second,in Chapter 4,this paper proposes ubfcm uncertain data clustering algorithm,because in the real world,data objects are generally uncertain and the boundaries between data objects are fuzzy,so by improving the fuzzy c-means clustering algorithm,this paper proposes a ubfcm uncertain data clustering algorithm for uncertain data.Firstly,the principle of fuzzy c-means algorithm is explained in detail,which lays the foundation for this paper.Then the definition of uncertain data clustering model is explained.By using the centroid of uncertain data object to replace the original uncertain data object,the clustering algorithm can be simplified.Finally,a new similarity calculation method is used to calculate the similarity between uncertain data objects and improve the clustering quality.Compared with the UK means algorithm,the F1 values of the algorithm in Iris data set,wine data set and glass data set are 0.8965,0.7642 and 0.6248,respectively,which are higher than the F1 values of the UK means algorithm,indicating that the algorithm has certain correctness.
Keywords/Search Tags:uncertain data, clustering, relative density, Fuzzy c-means
PDF Full Text Request
Related items