Font Size: a A A

Based On The Improved Clustering Algorithm In The Research And Implementation Of Data Mining System

Posted on:2013-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:M Y YangFull Text:PDF
GTID:2248330374485431Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering is to group data points into several clusters and makes the intra-clustersimilarity maximum and the inter-cluster similarity minimum. Clustering plays animportant role in data mining and is applied widely in fields of pattern recognition,computer version, and fuzzy control. Various types of clustering methods have beenproposed and developed in. Clustering algorithms are mainly divided into severalgroups, which are hierarchical clustering, partitioning clustering, density-based method,fuzzy-based method.Fuzzy K-Means clustering algorithm is proposed by Bezdek in1981which is apartition-based cluster analysis method. It is uesd widely in cluster analysis for that thefuzzy K-Means algorithm has higher efficiency and scalability and converges fast whendealing with large data sets. However it also has many deficiencies: the initial clustercenters are arbitrarily selected, each feature of the samples plays a uniform contributionfor clutering.In this paper, we propose an improved fuzzy k-means clustering based onaverage-distance algorithm and Weight Computing. The algorithm starts withcomputing every two data sample, then selecting the intensive regional data as initialcenter. Weight Computing is the next step. Weight Computing in improved fuzzyK-Means Algorithm proposes a novel way to compute the weights, which represent thecontributions of different features to clusters. It takes the entire data space intoconsideration to calculate the weight. The weight of a dimension in a cluster can betreated as the degree of the dimension in contribution to the cluster and also it variesbetween different clusters, making the dissimilarity remarkably. The weight can speedup the clustering process and get more satisfied clustering results.BIRCH algorithm is an integrated hierarchical clustering algorithm. It uses theclustering features (Clustering Feature, CF) and cluster feature tree (CF Tree) twoconcepts for the general cluster description.This paper analyzes the inadequacies of the data clustering algorithm BIRCH,proposing a algorithms based-on density and dynamic threshold to adapt arbitrary shape of the data sets. It is combined with double parameters that is density and threshold, andbased on the intrinsic characteristics of the data sets, dynamically changing thethreshold T which both can control the size of the CF tree, and can take advantage of thedifferent globular cluster approching to arbitrary shape data clustering. Experimentalresults show that it Algorithm complexity and the BIRCH considerable, and greatlyreduce the size of the CF effect of clusters of arbitrary shape can be achieved with TheDBSCAN similar results.Today, software and IT services industry has been expanded year by year. theindustry is extremely competitive. While the cost of each company, turnover, profit andso on is changing in real-time. It is very difficult for government sector to manage thesedata, just getting a little information from many large sets.This data mining system which is aiming to software and information serviceseffectively meet the user needs. It combine to data warehouse closely and has themining haracteristics with efficient and interactive.This system can study of the analysissoftware and IT services running in depth and provide the decision for leadership,helping them to guide the industry to developt healthy, rapid and orderly.
Keywords/Search Tags:data mining, clustering algorithm, fuzzy K-Means, BIRCH
PDF Full Text Request
Related items