Font Size: a A A

Study And Analysis On Clustering Algorithm In Data Mining

Posted on:2016-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2348330488474487Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and electronic commerce, the amount of data to be processed are growing rapidly. Faced with a flood of data, data mining technology have emerged, which involves machine learning, pattern recognition, statistics, artificial intelligence, and many other subjects. Data mining technology is difficult, high technology content, more focus on discovery from massive data implicit knowledge of scalability.Clustering analysis is an important field of data mining, and the fuzzy C-means (FCM) clustering algorithm is the most widely used clustering analysis algorithm, facing today's massive data to be clustered, the disadvantages of FCM clustering algorithm are particularly prominent, mainly as follows:the amount of data FCM algorithm converges slowly; the number of clusters requires prior given artificially, with great uncertainty; the algorithm is sensitive to noise data, anti-noise performance is poor.For the above shortcomings of FCM algorithm, this thesis mainly analyze and integrate improved FCM clustering algorithms. Based on fuzzy entropy FCM clustering algorithm, the relative entropy constraint added FCM clustering algorithm is further analyzed, the method added the relative entropy to the objective function as a adjustment function, to maximize the dissimilarity between different clusters, and has the ability that assigning low membership degrees to noise points regarding all clusters, Thus can restrain from the effect of noise data on cluster centers effectively, and added the relative entropy coefficient ?, which determines the importance of relative entropy or divergence for the users. Meanwhile, in this algorithm adding opponent suppression approach by Wei Limei proposed, and accelerating the convergence rate; In addition,for the shortcoming that the number of clusters requires prior given by users, cluster validity function was added to the algorithm and can achieve the optimal number of clustering automatic.Finally, the integrated algorithm performed on MATLAB simulation platform, it was applied to simple data sets, the two-dimensional data sets, three-dimensional data set, IRIS data sets, and compared with the traditional FCM algorithm, FCM algorithm based on fuzzy entropy. Experiments results shows that the integrated FCM algorithm not only improves the ability of anti-noise, but also the convergence rate has been improved, and can determine the optimal number of clusters automatically.The time complexity of traditional FCM clustering algorithm is O(nc2p), The time complexity of the integrated algorithm is O(nc3p)+O(2nc2 log(nc))+O(nc). Complexity has increased, but the Lambert-W function Wo(.) in the relative entropy constraint for FCM algorithm can be called directly in MATLAB, it is relatively simple to calculate.
Keywords/Search Tags:Data mining, Fuzzy C-means Clustering, Opponents Suppression type, Cluster validity function, Fuzzy entropy, Relative entropy
PDF Full Text Request
Related items