Font Size: a A A

Research On Clustering Of Uncertain Data

Posted on:2015-06-28Degree:MasterType:Thesis
Country:ChinaCandidate:S B SuFull Text:PDF
GTID:2298330467988488Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous progress of science and technology, more and more data are storedin the database.Because of Inaccurate measurements, obsolete data, sampling error and otherfactors, these data contain uncertainty. Meanwhile, there are still a lot of data have noise data,sparse distribution,high dimensional and other features.As an important field of data mining, clustering analysis in dealing with uncertain datahas been favor overwhelming majority of scholars. In this paper, the characteristics ofuncertain data, uncertain data mining theory, the theory of clustering and some classicuncertain clustering algorithm are discussed. On this basis, according to the uncertain datacontains noise, arbitrary shape clustering proposed a new and efficient clustering algorithm,according to the uncertain high dimension data to define a new similarity function and designa high dimension clustering algorithm over uncertain data efficiently.Including noise data, discover arbitrary shape clusters and the input parameters arehighly dependence on relevant domain knowledge and other factors still are challengingproblems in the field of uncertain data clustering. In this paper,the concept of uncertain objectexpectations Center of CK-means is refered, and then the uncertainty expected center nearestneighbor search clustering (UECNNSC) algorithm based on the nearest neighbor searchingidea was proposed. At the same time, the algorithm uses the maximum threshold strategy,when searching for the nearest neighbor, the search range is limited within a certain range,which further reduces the time overhead. The new algorithm calculate the expected center ofeach uncertain object first, and then the uncertain objects according to their expected centerfor nearest neighbor search clustering within a given threshold. In the clustering process, thealgorithm scann the data only once, thus avoiding computing the distance from all clusterheart to expected center. Finally, an extended application of the algorithm proposed, foradditional limited objects of the data set, according to the distance of their expected center totheir Neighbor points can be divided into different clusters. Theory and experiments show that,compared with some current algorithms, UECNNSC algorithm can filter noise dataeffectively and obtain arbitrary shape uncertain clustering results efficiently with little prioriknowledge.For uncertainty clustering algorithms have new challenges, which bring from highdimensional data. In this paper, combining with uncertainty and high dimensional features ofthe data object,we define a metric function who can accurately express the similarity between the high-dimensional uncertain objects, and then the high dimensional uncertain data efficientclustering (HDUDEC) algorithm based on the agglomerative hierarchical clustering idea wasproposed. The algorithms for searching clustering according to similarity threshold, a clusterwith each search, thus avoiding repeated iterative calculation for the data object. Similarly, forthe newly added limited high-dimensional data objects, as long as the calculation of theirsimilarity with each cluster can be classified. Theory and experiments show that, comparedwith some current algorithms, HDUDEC algorithm can obtain arbitrary shape uncertainhigh-dimensional clustering results quickly and efficiently.
Keywords/Search Tags:Expectations center, nearest neighbor search, high dimensional uncertaindata, agglomerative hierarchical clustering, similarity measure, uncertain clustering
PDF Full Text Request
Related items