Font Size: a A A

Research On Clustering Algorithms And Its Applications To Bioinformatics

Posted on:2013-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2248330371464542Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering algorithms and their applications are studied in this dissertation. The researches on clustering analysis have been studied for a long time, and clustering analysis have been widely used in machine learning, pattern recognition, data mining, image processing, and bioinformatics, etc. Aiming at the shortcomings of the fuzzy clustering algorithms, we analyze and try to find the solutions to the shortcomings. And several improved fuzzy clustering algorithms are proposed. The main contributions are summarized as follows.(1)Semi-supervised fuzzy clustering algorithm is a relatively new learning manner between supervised clustering and unsupervised clustering, which utilizes partial prior information to guide the entire clustering procedure. However, just like the traditional fuzzy clustering algorithms, it is sensitive to the initialization, and is very easy to stick to the local optimum. Aiming at this problem, an improved semi-supervised fuzzy clustering algorithm based on QPSO is proposed. The algorithm fully takes advantage of the global search ability of QPSO, and it can avoid sticking to the local optimum. To ensure that the optimal clustering number can be determined automatically, we define a new fitness function. The improved algorithm is better than traditional semi-supervised fuzzy clustering algorithm on clustering accuracy and stability.(2)Most of traditional fuzzy clustering algorithms have a common problem, that is we must preset the clustering number. But, in fact, it is difficult to get the clustering number of the dataset, or the clustering number is changing with the expansion of the dataset. Under these conditions, the traditional algorithms can not work efficiently. So, an improved fuzzy C-means clustering algorithm based on discrete QPSO is proposed. And, we propose a new criteria function to ensure the optimal clustering number can be determined automatically. Experimental results show that our algorithm can get the optimal clustering number.(3)Semi-supervised fuzzy clustering is a procedure that partitions a large number of unlabeled sample data by utilizing a small number of prior information; Collaborative fuzzy clustering is a clustering procedure which utilizes the collaboration between different feature subsets, and it can be combined with other clustering algorithms to develop a new and better clustering algorithm. However, so far, there is no research on integrating the collaborative clustering algorithm with the semi-supervised fuzzy clustering algorithm. Therefore, we integrate the collaborative clustering algorithm with the semi-supervised fuzzy clustering algorithm, and propose a collaboratively semi-supervised fuzzy clustering algorithm. We give the experimental analysis on clustering accuracy, the change of the membership, the influences of the collaborative coefficient and pairwise constraints to clustering results. And the experimental results show that the improved algorithm can get better clustering performance.(4)Combining multiple clusterings is a new research direction, which combines several different clustering results together to gain a new and better clustering result. Inspired by the theory of combining classifiers proposed by Kittler, a new method for combining multiple fuzzy clusterings CMFC is developed. In CMFC, we also use six rules by Kittler as Product, Sum, Max, Min, Median, and Majority Vote, and the probability is replaced by the membership of the fuzzy clustering algorithm. The experimental results on Iris and Wine datasets demonstrate that the new combining multiple fuzzy clusterings theory has better clustering performance. We also analyze the impact that the similarity between different fuzzy clustering algorithms affects on the combining multiple fuzzy clusterings. The more dissimilar between different fuzzy clustering algorithms, the better clustering performance gained by combining multiple fuzzy clusterings.
Keywords/Search Tags:Fuzzy clustering, semi-supervised clustering, pairwise constraints, discrete QPSO, criteria function, collaborative clustering, combining multiple clusterings, bioinformatics
PDF Full Text Request
Related items