Font Size: a A A

An Unsupervised Clustering Method Based On Gene Ontology

Posted on:2012-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:M C GaoFull Text:PDF
GTID:2178330335982431Subject:Biological Information Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of DNA microarray technology, it has become possible to achieve massive gene expression data, how to mine useful information in these data has become a pressing problem.Clustering analysis is one of the most widely used and effective methods of gene expression analysis. Clustering analysis of genes can group the genes with similar expression patterns into same cluster.The genes within cluster may have similar or related function, and we can predict the function of unknow genes according to the known genes in the same cluster. In this study, we use the hierarchical clustering, K-means, Self-Organizing Map and fuzzy C-Means to analyze gene expression data. At the same time, we improve fuzzy C-means by introducing the knowledge of Gene Ontology. Firstly,we determine the number of clusters with rational use of biological process ontology and overcome the shortcoming that the number of clusters is always unknown in advance. Secondly, the traditional fuzzy C means is sensitive to the initial cluster centers, we use the credibility of the evidence codes in gene annotation to determine the initial membership. We compare the performance of the results based on various clustering algorithms with the Zscore value. In addition, in order to do a better comparison between ontology-based fuzzy C means and the traditional Fuzzy C means, we also applied three different validity functions to evaluate the results, which were used to measure the compact degree within cluster and the separation degree between clusters, based on the geometric structure of the data set. And the best initial parameter values under different validity methods were set. It was found that, Amine M. Bensaid validity function is the most suitable to evaluate the validity for this data set. By comparing the validity of the results of fuzzy C means and ontology-based fuzzy C means, we found the application of gene ontology helps to achieve clustering with better compactness within cluster as well as better separation between clusters. Finally, while comparing the performance of results produced by different clustering algorithms,it proves that the Gene Ontology can greatly improve the performance of clustering.
Keywords/Search Tags:Gene Ontology, Clustering Analysis, Validity
PDF Full Text Request
Related items