FCM Algorithm Improvement With KPCA Method As The Core

Posted on:2016-09-29

Degree:Master

Type:Thesis

Country:China

Candidate:W Yan

Full Text:PDF

GTID:2308330461454758

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Fuzzy c-means algorithm(FCM) belongs to the clustering methods based on partitioning. It has already been 40 years since the algorithm was invented. A large number of scholars made in-depth study about it, and obtained various improved algorithms. These algorithms can be used to analysis multiple types of data from different industries, and they are effective analytical tools. The internet and the mobile internet are developing rapidly. As one of the key techniques of data mining, clustering analysis, especially FCM algorithm has a huge potential for growth and great research value.The theory of kernel function was proposed in one hundred years ago, and was applied in machine learning field after half a century. In 1992, kernel function was introduced in support vector machine theory. Since then, kernel function and the kernel trick came into the picture, and both have been applied into different fields in a very short time. As an example, when introducing the kernel trick into FCM algorithm, Kernel-FCM(KFCM) algorithm arises. It is simple to apply the kernel trick despite the theory derivation is complicated. In brief, the kernel trick calculates the high demensional inner product by using kernel function, and uses it to represent the low demensional similarity. Making use of kernel function allows us to use the operations of low dimensinal samples to represent the inner product in high demensional space, so it is convenient to do data operations in high dimensional space, even do not know the details.Principle component analysis(PCA) is one of the feature extraction techniques. It uses the original data features linearly constructing new features and abandons the unimportant ones. In this process it has eliminated the redundancy and noise in feature space, and reduced the dimension of data. PCA method has a narrow scope of appllication. Kernel principle compnent analysis(KPCA) has introduced the kernel trick into PCA method. Hence the method has an extended appllication scope and a new function.Based on theories and methods mentioned above, this thesis detailedly explained the advantages and defects of FCM algorithm and KPCA method, proposed a density-based equalization FCM algorithm based on KPCA method(KPCA-DBEFCM), and a number of clusters automatic-adjusting algorithm(KPCA-NCAA). Based on Matlab 7.11, several simulation experiments have been designed to test the effectiveness of proposed methods. The datasets have been used in experiments are synthetic datasets. By analysising and summarizing the experimental result this thesis has clarified the effectiveness and defects of proposed methods, and the further study has been predicted as well.The experiment results indicated that the proposed KPCA-DBEFCM algorithm could cluste the unbalanced datasets effectively. The proposed KPCA-NCAA algorithm could effectively deal with the clustering result when the objective clustering number was set too much, and adjust the number of clusters to the optimal number. It is easy to modify the KPCA-DBEFCM algorithm according to the specific situation. This algorithm has creatively used the density information constructing the equalization item, and the utilization pattern has the reference value. KPCA-NCAA algorithm has creatively used the KPCA method. This thesis has detailedly analysed the feasibility and meaning of this usage, explained the precondition and limitation of this algorithm. During the process of argumentation, the clustering result and original dataset were equally treated, and the Cluster Feature Sub-Space was used to analysis the algorithm. These two means has a certain reference value.

Keywords/Search Tags:

FCM, KPCA, EFCM, Number of Clusters Automatic-Adjusting, Cluster Feature Sub-Space

PDF Full Text Request

Related items

1	An Automatic Method To Determine The Number Of Clusters Based On Multi-Validity Indices
2	Algorithms Implementation Of Determining The Number Of Clusters And Initial Cluster Centers For Mixed Data
3	Research On Clustering Methods For The Data With Large Number Of Clusters
4	GMM Trees And Forests:Hierarchical Algorithms For Estimating The Number Of Clusters In High Dimensional Complex Data
5	Research Andapplication On Determining Optimal Number Of Clusters In Cluster Analysis
6	Research On Determining Optimal Number Of Clusters In Cluster Analysis
7	Research On Determining Optimal Number Of Clusters In Cluster Analysis
8	Research On Effective Internal Index Framework For Cluster Evaluation
9	Research On Determining The Number Of Clusters Based On Information Entropy
10	Research On The Robust And Adaptive Switching C-Regressions Models Based On Cluster Analysis