Font Size: a A A

Research And Application Of Improved Fuzzy C-Means Clustering Algorithm Based On Weka Platform

Posted on:2014-11-30Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhengFull Text:PDF
GTID:2268330401977118Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is a method to obtain useful information and knowledge resources from large amounts of data resources. The clustering algorithm is widely used and studied in Data mining algorithms. The fuzzy C-means clustering algorithm using fuzzy theory is classified according to the degree of membership instance belongs to which category, treat clustering data analysis more objective.The paper analyzed the fuzzy C-means clustering algorithm, the fuzzy C-means clustering algorithm is simple and has good performance, but more sensitive to initial values, easy to make the algorithm fall into the local minimum and not global optimum, not only the number of iterations increases, but eventually easily cause the failure of clustering. In view of the requirement of the fuzzy C-means clustering algorithm, the paper proposes a fuzzy C-means clustering algorithm based on density of instances, the clustering center is closer to the actual class center, reduces the number of iterations, and improves the clustering effect. Through experiments on simulated data sets and UCI data sets, which show that the improved algorithm is effective.The rich system function, easy to operate, and the Java-based open source data mining tool Weka attract the attention of the data mining researchers. But the Weka is weak in terms of clustering, so the article studied the Weka’s development environment structure, interface specification, specific methods to add new algorithms and implementation steps. And it achieves one hierarchical clustering algorithm SmipleChameleon algorithm、fuzzy C-means clustering algorithm and the improved fuzzy C-means clustering algorithm.In order to further verify the effectiveness of the improved algorithm, the improved algorithm is applied to social insurance audit data. Through the analysis of social insurance audit data, according to the data has the features of a large amount of data, many of payment type and have redundant data. The paper does some pretreatment on social security audit data, such as data consolidation and attributes selection. And then according to the four clustering objective of each area, the paper does some comparative experiments using the traditional fuzzy C-means clustering algorithm and the improved algorithm Through analysis on the experimental results, the improved algorithm reduces the number of iterations at the same time improves the clustering effect, which verify the effectiveness of the improved algorithm again.
Keywords/Search Tags:Weka platform, fuzzy C-means clustering algorithm, density ofinstance, social insurance audit
PDF Full Text Request
Related items