Font Size: a A A

Large. Uncertain Database Clustering

Posted on:2012-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2218330338955748Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the main methods of data mining,and people pay more and more attention to it. Clustering is a set of objects divided into several clusters, making the objects within the same cluster as similar and different clusters as different. Clustering in real life is becoming more and more widely, so studying how to enhance the efficiency of clustering is becoming more and more important. If clustering efficiency is still not very good, or need too much storage, even the best clustering algorithm is not much practical value, and the introduction of uncertain data is greatly increased clustering difficult. Clustering of uncertain data is an important research hotspot in clustering research and application value in real life. Because data is uncertain, we need to calculate the expected distance of each object and its cluster representative,the expected distance is very time-consuming, because the probability density function of each object is different and arbitrary. The expected distance is the main reason for influence of efficiency of algorithms. Therefore,ck-means algorithm proposed effectively improve the efficiency of clustering of uncertain data, but the sample of clustering is very large, the cost of clustering are still very high.This paper improved the efficiency of uncertain data mining by optimizing classical ck-means algorithm.The following papects are included:Firstly, this paper introduces the basic concepts and main methods of data mining and clusteringSecondly,this paper introduces the concept of uncertainty related to data clustering, uk-means of classical uncertain data clustering algorithms and its pruning algorithm, as well as ck-means algorithm.Thirdly, ck-means algorithm presented in this paper only needs to calculate distance of part of centroids to object, it can greatly improve the efficiency of ck-means algorithm.The method is presented based on the kd-tree.Experiments validate the new algorithm's validity. Fourthly, cf-means algorithm presented in this paper based on cf-tree,this paper described how ro construct cf-tree in detail and reconstruction rules,and how to improved efficiency of clustering.Fifthly, through experiments on synthetic data,it is proved that the optimization strategy is efffective and show that two improved algorithms are significant.At last,the conclusion and future work are presented.
Keywords/Search Tags:clustering, k-d tree, CF tree, ck-means algorithm, improve
PDF Full Text Request
Related items