Font Size: a A A

Research On Fuzzy Clustering Algorithm Of Gene Expression Data

Posted on:2012-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:X M LiFull Text:PDF
GTID:2298330452961755Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development and application of microarray technology, clusteringanalysis of gene expression data has become a hot issue in the post-genome era as theHuman Genome Project accomplished. Gene functions and gene regulationinformation can be obtained by clustering analysis of gene expression data. It hadplayed an important role in identifying regulatory elements and constructing genenetworks. And it also had a significant impact on the investigation and application inthe fields of biomedicine. Fuzzy clustering method is one of important gene expressiondata analytical methods. As a gene may share similar functions with different groups ofgenes, fuzzy clustering algorithms are used to identify overlapping clusters of genes.Based on the special properties of gene expression data and gene regulatorymechanism, this dissertation deals with how to effectively mining biologicalinformation contained in gene expression data.The complex relationships between genes and the complexity of gene regulatorymechanism raised new challenges to information extraction and traditional clusteringanalysis. In view of the special properties of many co-regulating relationships and geneexpression data, this paper presents a new method of similarity measure. The similaritymeasure can be overcome the undesirable impact on the clustering results by noise dataand abnormal data to some extent and can reflect many co-regulating relationships.Based on immune genetic algorithm and the disadvantages of traditional fuzzyclustering algorithm, a dynamic fuzzy clustering algorithm is proposed. The proposedalgorithm can effectively avoid the local optima.Traditional fuzzy clustering algorithm is an unsupervised clustering technique,which only considers the mathematical characteristics of genes rather than biologicalones. It is necessary to work out a method which incorporates into clustering analysissome biological prior knowledge and can improve clustering performance.Incorporated two common functional information of genes, gene ontology informationand KEGG Pathway data, this paper proposes a semi-supervised fuzzy clusteringalgorithm of gene expression data based on multi-source fusion, i.e. MF-SFC. Then atwo-step procedure is applied. In the first step, the proposed method uses a fuzzy c-means clustering with prior biological knowledge called GOFuzzy to cluster thegenes with known functions. Then labeled data are generated by clustering results.Pairwise constrains are generated by KEGG pathway data. In the second step, labeleddata and pairwise constrains are used in MF-SFC algorithm to cluster genes. Thismethod allows genes with known functions to be assigned to other clusters.Experiment results show that the clustering results obtained by the proposed methodare more likely to be biologically Significant.
Keywords/Search Tags:gene expression data, fuzzy clustering, similaritymeasure, immune genetic algorithm, semi-supervised learning
PDF Full Text Request
Related items