Font Size: a A A

Novel Fuzzy Clustering Algorithms And Applications

Posted on:2012-02-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:C F GaoFull Text:PDF
GTID:1118330332991565Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Novel fuzzy clustering algorithms and their applications in biological fields are studied with the emphasis on fuzzy clustering theories and the practical problems in applied biotechnology. It is the interdisciplinary subject of computational intelligence and applied biotechnology-related topic, and is of great significance for both theoretical research and practical applications. There are two research routines in the thesis. For the theoretical research on fuzzy clustering, new fuzzy clustering algorithms are proposed. For the bioinformatics research, new theories of computational intelligence are proposed that aims to solve practical problems in applied biotechnology. Facilitated by the useful tool of computational intelligence of mining the complex information among the biological data, the two research routines of theory and application are integrated. The main contributions in the thesis are summarized as follows.1.Two new fuzzy clustering algorithms based on kernel method are proposed including collaborative kernel fuzzy clustering and weighted fuzzy kernel clustering. An improved collaborative kernel fuzzy c-means clustering (CKFCM) algorithm is proposed, in which the function of collaborative relationship was incorporated into kernel fuzzy c-means clustering (KFCM). CKFCM can map the observed data to a higher dimensional feature space with a kernel function which can enlarge the difference among samples, and CKFCM implementing on several subsets can be processed together with an objective function, which improves the clustering performance by collaborating partition matrices among different feature subsets. So CKFCM achieves better classification by more separable centers, and is an effective clustering with better performance. An improved algorithm of weighted fuzzy kernel clustering (WFKCA) is proposed to overcome its shortcoming of liability to stick to a local optimum. The idea of iterative self-organizing data analysis techniques algorithm (ISODATA) is introduced into the WFKCA, and initial center vectors are adjusted by the intermediate results from splitting and/or merging of clustering centers to reduce the possibility of local optimum. The improved algorithm uses matchable measurement from feature space, and increases the adjustment range of clustering centers, so it achieves more stable performance of clustering.2.Studies are made on the clustering algorithms based on fuzzy scatter matrices. Firstly, aiming at the problem that previous algorithms use inaccurate iterative expression of cluster centers, an improved clustering based on Fuzzy Fisher Criterion (FFC) with new centers equations is proposed. Secondly, an integrated fuzzy fisher clustering (IFFC) by combining the supervised and unsupervised clusterings is developed, and a novel classifier based on IFFC for recognizing secretory proteins is designed. The classifier is suitable for intelligent prediction in biology area and is convenient for users to train the model. Lastly, an automatic technique to determine the reasonable cluster number of complex biological datasets is proposed. The significant calculation is implemented by an optimization algorithm that reflects the idea of compactness of intra-cluster and separability of inter-cluster, then the reasonable cluster number is determined by using the maximum criteria of second order difference of objective function. The new method can automatically get the reasonable cluster number for complex datasets.3. Previous methods of feature extraction for protein sequence are suitable for the independent sequence, which are limited for heterologous proteins that in-frame fuse signal peptide. A structural fusion degree (SFD) is defined to determine the compatibility degree of target proteins and signal peptides, and the interaction between fused signal peptides and adjacent residues of proteins is analyzed mathematically. A mathematical model of extended signal region and the protein is proposed. SFD features are extracted from this model to recognize the secretability of heterologous proteins, and satisfactory results are obtained by the proposed model.4. A study is made on a recently developed semi-supervised fuzzy clustering algorithm with pairwise constraints, in which the disagreement on the magnitude order between penalty cost function and the basic objective function will cause over adjustment of membership values. In order to solve this problem, an improved algorithm is proposed based on a redefined objective function. A new constraint function is incorporated additively as a penalty cost of basic objective function to obtain a new semi-supervised optimization problem. The new penalty cost function can achieve a good agreement and cooperation with the basic objective function and can produce more accurate clustering results by moderately enhancing or reducing the ambiguous membership values.
Keywords/Search Tags:Semi-supervised clustering, fuzzy kernel clustering, collaborative clustering, fuzzy Fisher clustering, secretion of protein, feature extraction, recognition of signal peptide, classifier design, bioinformatics
PDF Full Text Request
Related items