Font Size: a A A

Study On Clustering Ensembles And Its Application In Telecommunication

Posted on:2009-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:J L WangFull Text:PDF
GTID:2178360242492093Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Clustering, as one of the hot points of Data Mining Research, becomes an increasingly hot topic. At present, many clustering algorithms are available, such as K-means, K-medoids, BIRCH, CHRE, DBSCAT,STING, and so on. Although some of them have been applied widely, it is hard for people to find suitable clustering algorithm for a proper data set, for there are many restricts on those data sets from clustering. So, clustering ensemble emerged as the times require. In 2002, clustering ensemble algorithm was putted forward and soon given extensive attention. Experiments proved that through this method we can get better result than single clustering algorithm. But this algorithm is far from mature, such as the enactment of some key parameters, the ensemble between 'soft' clusters and 'hard' clusters, how to design and choose the consensus functions, and so on. The main works in this paper are described as follows,1. On the basic of having studied clustering ensembles thoroughly, by focusing on reviewing the relationships between numbers of clusters in every clusterer and the quality of the final result, and an improved algorithm to improve the accuracy of clustering ensemble was made. First, according to the idea that there are diversities among clusters, a formula to measure this diversity was defined; secondly, whether the difference between numbers of clusters of clusterers and the target number has infect on the ensemble result through experiments was inspected, then a formula to calculate the weights on clusterers was developed. Experimental data show that improved algorithm is superior to the original algorithm on accuracy.2. K-means algorithm has always been used in telecommunications customer segmentation model, but this method has many problems, such as needs professionals to designate the numbers to be clustered and judge the results on their experience, the partition result is "too hard", and so on. In this paper, improved clustering ensemble (ICE) algorithm was used, and the data mining on PHS business of a certain city's telecommunications company was used as background, aims at customer calls, messages, and other attributes of customers' actions to do the segmentation. In this process, ICE algorithm was used to settle those problems mentioned above effectively and got a reasonable result. And at the same time, the Co-association matrix also was used to show the probability of each customer belonging to a cluster. In this way, the aim to "soften" the result was achieved, and this can make Data Mining more intelligent.
Keywords/Search Tags:Data Mining, Clustering algorithm, Clustering Ensemble, Diversity, Consensus Function, Telecommunication, Custom Segmentation, Data preprocessing
PDF Full Text Request
Related items