Font Size: a A A

Research On Fuzzy Clustering Algorithm Under The Weka Platform

Posted on:2014-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:S Y X T r a n T h i A n h Full Text:PDF
GTID:2268330422954100Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important tool that was traditionally defined as“precisely” assigning samples into different classes. However, there are lots ofambiguity exists among the properties of various objects in our real life, which make“precisely” classifying those objects becomes unfeasible. This is when the fuzzyclustering technique takes place. Fuzzy clustering is one of the most importantbranches in the field of data mining, which has been widely applied in lots of fieldssuch as classification, geology, business activities, pattern recognition, and imageprocessing. Because the uncertainty in the results of fuzzy clustering technique candescript the vagueness in real life samples, means that fuzzy clustering technique canreflect the real world more objectively, making the studies of fuzzy clusteringalgorithm and its application become more and more important.Weka is a comprehensive data mining platform that was developed by theUniversity of Waikato, New Zealand, which provides researchers in the field of datamining numbers of useful data pre-processing and processing methods includingclassifying, clustering, association rules, etc., as well as a variety of methods foralgorithms evaluation. As an open source platform, Weka has very good scalabilityand compatibility, as well as well-defined data structures and basic statisticalinterface, which provide a very powerful tool for its developers.The main purpose of this paper is seeking the effective methods from theexisting problems within the field of fuzzy clustering to improve the clusteringresults, the stability and robustness of clustering algorithms, as well as reducingmanual intervention and the amount of expert knowledge required. Therefore, thispaper is about Harmony Search (HS)-based and Spectral analysis-based fuzzyclustering algorithms, which are implemented in Weka to enrich this platform andexpand application range. The main works and innovations are as follow:Firstly, the Global Dynamic Adaptive Clustering Harmony Search K-Harmonic Means (GDACHSKHM) algorithm is proposed. This is a meta-heuristic fuzzyclustering algorithm which based on the combination of HS and KHM algorithms.GDACHSKHM not only takes advantage of HS’s global searching ability to find theglobal best solution, but is also capable of detecting the number of clusters throughanalyzing the intrinsic characteristic of the dataset. During operation process, thevalues of parameters can also be automatically adapted based on the results of eachiteration without the need of manual adjustment which leads to reduction in themanual intervention. Experiments also verify the effectiveness and robustness ofGDACHSKHM.Secondly, we proposed the Spectral Differential Evolution KHM (SPDEKHM),which contains two steps: spectral mapping step and fuzzy clustering step. Spectralmapping is in fact the solving process of Laplacian matrix’s eigenvalues, whichmaps the data of the original space to low-dimensional solution space, by which, thedata dimension is reduced to clarify the structure of the dataset. During that process,the number of clusters is also detected based on the eigengaps of the Laplacianmatrix. In the second step, the hybrid algorithm DEKHM, which has both meritsfrom of DE and KHM, is utilized to cluster the nodes in mapping space. Theexperiments taking place in this chapter also prove that SPDEKHM results highclustering accuracy in most of the circumstances.Finally, the idea of spectral clustering is applied to the community structuredetection problem in complex networks, which leads to the proposal of theCorrelation Spectral Mapping Automatic Community Detection (CSMACD)algorithm. CSMACD first transforms the network community detection problem intoclustering problem using a mapping process based on the spectrum of correlationmatrix. After that, it utilizes DEKHM algorithm to partition the network whileautomatically detecting the number of communities using modulation as its objectivefunction. Experiments on both synthetic and real networks show that CSMACD isable to accurately analyze the structure of complex networks of which communities are at different sizes, proving the robustness of this algorithm.
Keywords/Search Tags:Fuzzy Clustering, Weka, Harmony Search, Spectral Clustering, Community Detection
PDF Full Text Request
Related items