Font Size: a A A

Research On Fuzzy Clustering Algorithm And Its Application

Posted on:2010-05-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:F H QuFull Text:PDF
GTID:1118360272996717Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the development of the internet and information technology, human access more and more data every day. An important tool to deal with these data is cluster analysis. Compared with traditional methods of hard clustering, fuzzy clustering has a better ability to describe the data structure, which makes it become the mainstream of the clustering algorithms. In 1969, Ruspini first introduced the concept of fuzzy set theory into the cluster analysis fields. Since then many clustering methods based on fuzzy set theory have been proposed. The objective function based fuzzy clustering algorithms are the most popular used method in fuzzy clustering. These algorithms obtain the fuzzy partitions and clustering results of the data sets by solving a nonlinear programming problem. FCM-type algorithms (here mainly refer to the fuzzy objective function based partition clustering algorithms which are derived from FCM) are one of the most important classes of fuzzy clustering algorithms. A great deal of research has been conducted on FCM-type algorithms. In 2008, Miyamoto et al. mainly discussed the latest developments and practical applications of FCM-type algorithms in their works.This dissertation mainly focus on the studying of the FCM-type algorithms, and some algorithms are proposed to overcome the defect of the original algorithms. We prove the convergence theorem of a class of kernel based fuzzy clustering algorithms.FCM model is a probabilistic type constraint clustering model. It has been successfully applied into many areas. In the literature, there are a large number of researches have been conducted on FCM, they mainly focuse on the improvements of the model, parameter selection, study of the validity of clustering and how to obtain better solutions of the optimization problems. In these algorithms, the most two important algorithms are the possibilistic c-means clustering algorithm (PCM), which was proposed by Krishnapuram and Keller, and the possibilistic fuzzy c-means clustering algorithm (PCM), which was proposed by Pal and Bezdek.In FCM, the membership for a datum crossing all classes sums to 1, which makes its membership disable to reflect the real distance from the datum to the cluster center, and this is also the reason why FCM is sensitive to noise and outliers. To address this problem, PCM relaxes the constraint of FCM, and a penalty item is added on the objective function to avoid the meaningless optimal solutions. Although PCM solves the noise sensitivity problem of FCM, it also generates some new problems. Solving the optimization problem of PCM is equivalent to solve a number of independent sub-optimization problems, and each sub-optimization problem is dependent on only one center, which makes it tend to generate coincident clusters. From the experiments we found that, the global optimal solution is obtained only when all the clusters are coincident, that is to say we must try to obtain suboptimal solutions to get meaningful partitions. It obviously violates the target of designing the model. To solve the problems in FCM and PCM, Pal and Bezdek put forward the possibilistic fuzzy c-means clustering model. PFCM have the advantages of both FCM and PCM, in the mean while, it avoids the defects of both algorithms, i.e., PFCM does not generate coincident clusters and it also has a good noise robustness. There are c + 5 parameters which are set by the user in PFCM, however, how to set these parameters still has no theoretical basis, which makes PFCM have a strong dependence on the parameter settings. FCM, PCM and PFCM algorithms tend to find compact spherical clusters, they lacks the ability of indicating non-convex cluster structures of the data set. To address this issue, kernel technique is introduced into such algorithms, and kernel-based FCM, PCM and PFCM algorithm have been proposed. Such algorithms have the ability of finding the clusters with non-convex structures, but our experiments show that these algorithms also inherit the defect of the original algorithms, such as sensitivity to noise (in FCM), tending to generate coincident clusters (in PCM) and having more parameters (in PFCM).Considering the above problems, we put forward a kernel based improved fuzzy c-means clustering model, and three different forms algorithms (IKFCM1, IKFCM2 and IKDFCM algorithm) are obtained from this model. IKFCM1 and IKFCM2 have the ability of finding non-convex clusters, and IKDFCM has a better noise robustness and lower time complexity. The contrast experiments with the above algorithms show the clustering performance of the proposed algorithm.The above discussed algorithms still exist or partly exist the following problems: (1) The algorithm requires the assumption on the number of clusters and can not execute unsupervised clustering. (2) Solving the optimization problem of the clustering model is equivalent to solving a number of independent sub-optimization problems, and each sub-optimization problem is dependent on only one center, which makes it tend to generate coincident clusters. (3) These algorithms eventually boil down to the optimization problem of minimizing the objective function, and the traditional iterative method can not guarantee to converge to the global optimal solution, although it can be solved through the introduction of global optimization methods, it requires high time complexity to get precise global optimum. To solve the above problems, we put forward GKPCM (Generalized Kernel Possibilistic C-Means clustering) clustering model, which is a generalization of the conventional possibilistic and kernel based possibilistic clustering model. By limiting the feasible region of the solutions and using the global optimization techniques (here uses the simulated annealing as an example) to solve the optimization problem, GKPCM inherits the advantages of robustness to noise and avoids the problem of generating coincident clusters, and can well find the global optimal solution, and speed up the convergence speed by decreasing the search range of the global optimization techniques.Kernel-based fuzzy clustering is a new branch of fuzzy clustering algorithms, both of two important recent books of fuzzy clustering algorithms have sole part to introduce the kernel based fuzzy clustering algorithms. Although kernel clustering algorithm has many good properties, the convergence of such algorithms is still not studied. It is well-known that convergence is the basis for the iterative algorithms. In this dissertation, we establish the convergence of different algorithms that are derived from two kernel based fuzzy clustering model (including KFCM and IKFCM model). Take the KFCM for example, we give the following convergence theorem of KFCM2 and KDFCM: Theorem 0.1 (convergence of KFCM2) Let X contain at least n≥c≥2 points, fuzzy factor m > 1 , the rank of kernel matrix K is denoted by R(K) which satisfies R(K)≥c, let U0 be the starting point of iteration with U0∈Mfc1 , let {U(k)}k=1,2,…denote the sequence generated by the KFCM2 algorithm, then {U(k)}k=1,2,… either terminates at a point in (?) or there is a subsequence converging to a point in (?), where (?); (1)(?). (2)Theorem 0.2 (convergence of KDFCM ) Let X contain at least n > c > 2 different points , fuzzy factor m > 1 , K is the kernel function,θ∈Rs, f(z;θ) = K(z, z) -2K(θ, z) is a convex function with respect to z , let V(0) be the starting point of iteration with V(0)∈Rs , let {U(k), V(k)}k=1,2,… denote the sequence generated by the KDFCM algorithm, then {U(k), V(k)}k=1,2,… either terminates at a point in (?) or there is a subsequence converging to a point in (?), where (?)= {(U*, V*)∈Mfc1×Rsc |(?); (3)(?). (4)The other algorithms of different forms of IKFCM and KFCM model also have the similar convergence theorems. The convergence theorem of FCM and IFCM can be obtainned from the proved theorems of KFCM and IKFCM if we set the kernel function for some special forms. Hence the convergence theorem of KFCM is a generalization of the original FCM algorithm, and the convergence theorem of IKFCM is the first proof for the IFCM algorithm. The theorems show that the convergence of IKFCM and KFCM has a close relations with the ranks of the kernel matrix and the convexity of the kernel functions, which also provide us a theoretical basis of selecting reasonable kernel functions when executing such algorithms.An important issue of the objective function based fuzzy clustering algorithms is that such algorithm is sensitive to the initializations. Inspired by the convergence theorem 6.2.1, we propose a new thought of partitioning the data by the domain of attraction of the fixed point. Based on this idea, a new mean shift clustering algorithm called unsupervised multi-scale fuzzy clustering (UMF) is put forward, and a cluster validity is proposed to determine the final partition of UMF. Compared to the traditional fuzzy clustering, UMF has the following characteristics: UMF is not influenced by the initializations of the cluster centers; it can indicate the data structure from different "partition scales"; it can execute unsupervised clustering for the given data set. A fast vision of UMF is also presented in order to meet the need of dealing with the large data sets. Finally, we apply UMF to improve the EM based mixture-resolving methods and the fuzzy clustering algorithms, the experimental results show the performance of UMF.
Keywords/Search Tags:Fuzzy clustering, feature space, convergence, cluster validity, global optimization, unsupervised multi-scale fuzzy clustering
PDF Full Text Request
Related items