Large. Uncertain Database Clustering

Posted on:2012-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Li

Full Text:PDF

GTID:2218330338955748

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Cluster analysis is one of the main methods of data mining,and people pay more and more attention to it. Clustering is a set of objects divided into several clusters, making the objects within the same cluster as similar and different clusters as different. Clustering in real life is becoming more and more widely, so studying how to enhance the efficiency of clustering is becoming more and more important. If clustering efficiency is still not very good, or need too much storage, even the best clustering algorithm is not much practical value, and the introduction of uncertain data is greatly increased clustering difficult. Clustering of uncertain data is an important research hotspot in clustering research and application value in real life. Because data is uncertain, we need to calculate the expected distance of each object and its cluster representative,the expected distance is very time-consuming, because the probability density function of each object is different and arbitrary. The expected distance is the main reason for influence of efficiency of algorithms. Therefore,ck-means algorithm proposed effectively improve the efficiency of clustering of uncertain data, but the sample of clustering is very large, the cost of clustering are still very high.This paper improved the efficiency of uncertain data mining by optimizing classical ck-means algorithm.The following papects are included:Firstly, this paper introduces the basic concepts and main methods of data mining and clusteringSecondly,this paper introduces the concept of uncertainty related to data clustering, uk-means of classical uncertain data clustering algorithms and its pruning algorithm, as well as ck-means algorithm.Thirdly, ck-means algorithm presented in this paper only needs to calculate distance of part of centroids to object, it can greatly improve the efficiency of ck-means algorithm.The method is presented based on the kd-tree.Experiments validate the new algorithm's validity. Fourthly, cf-means algorithm presented in this paper based on cf-tree,this paper described how ro construct cf-tree in detail and reconstruction rules,and how to improved efficiency of clustering.Fifthly, through experiments on synthetic data,it is proved that the optimization strategy is efffective and show that two improved algorithms are significant.At last,the conclusion and future work are presented.

Keywords/Search Tags:

clustering, k-d tree, CF tree, ck-means algorithm, improve

PDF Full Text Request

Related items

1	K-means Based On Binary And Svm Decision Tree Algorithm Of Data Mining Research
2	KK-means Clustering Method Improved Based-on Minimum Cost Spanning Tree And Its Applications In Seismic Data
3	Research On Clustering DLIS-R Tree Algorithm Based On Spatial Data
4	The Three-Dimensional Index Structure Of R~*-tree Based On The Minimum Bounding Box And The Adaptive Clustering
5	Research On Clustering Algorithm Based On Tree Center Of Gravity And Cut Edge Constraints
6	Research On Web Document Clustering Approaches Based On Phrase Features
7	Research On Parallel K-Means Algorithm Based On MapReduce
8	Optimization Method On Node Self-Adaptive Splitting Of R~*S-Tree
9	Research On K-means Clustering Algorithm Based On Coresets
10	Research And Application Of Web Chinese Text Clustering Algorithm Based On Minimum Spanning Tree