Font Size: a A A

Microarray Data Analysis Based On Biological Knowledge

Posted on:2007-07-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:C FangFull Text:PDF
GTID:1100360242461384Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In the last decade of 20th century, microarray became one of the most important biological techniques. Microarray technology made it possible to simultaneously monitor the expression levels of thousands of genes during biological processes or across collections of related samples and provided the high-level research method for protein synthesis. Currently, the microarray experiments are mostly applied at genome level, which will generate a large amount of expression data. Consequently, the analysis of expression data is becoming a challenge. How to find the regulation relationship among genes from these data and moreover, the molecular mechanisms under biological phenomena, is the difficult but a hot area in bioinformatics.Generally, the genes that show co-expression patterns in the same experiment condition have similar biological function or participate in the same cellular process, and also may be co-regulated by the same transcription factor. Therefore, incorporating biological knowledge seems to be the tendency in expression data analysis. Biological knowledge may include aspects such as sequence alignments, protein structures and biological functions, which can provide guidance to expression data analysis. Microarray data analysis based on biological knowledge can effectively avoid the shortages of pure mathematical methods.Some previous methods based on biological knowledge have shown some advantages in microarray data analysis. Nevertheless, current researchers neglect the integration of all kinds of information. Accordingly, three processes of microarray data analysis based on biological knowledge are presented in this thesis, which include: Gene Ontology guided clustering, enhanced gene set enrichment analysis and functional module analysis. The main contents and conclusions are listed as follows:(1) A clustering algorithm based on Gene Ontology is developed to analyze microarray data. In this algorithm, the tree structure of GO is used as the framework of clustering. The genes in the microarray data are mapped to the GO tree by their corresponding GO term. After traversing the GO tree at every level, the gene clusters with both similar expression patterns and function correlations are produced. This algorithm was validated on two well-known public data sets and the results were compared with some previous works. It is shown that this algorithm has advantages in both the quality of clusters and the precision of biological annotations. Furthermore, the comparison between this method and parallel software GO-Cluster also shows that the results of this method are more coherent in expression patterns and biological functions.(2) An improvement of gene set enrichment analysis is produced by incorporating expression-correlated information of genes in gene sets into original gene set enrichment analysis. This enhanced gene set enrichment analysis evaluates the significance of gene sets both in differentially expressed levels and expression correlations of the genes inside. Compared with original method, the results of enhanced gene set enrichment analysis are more significant and have lower false discovery rate. In addition, the enhanced gene set enrichment analysis may generate more significant gene sets than original one, which include some gene sets correlated with the phenotype but can not be identified by original gene set enrichment analysis.(3) The microarray data on mouse genome for the cellular regulatory mechanisms involving estrogen and its cardio-protective role during ischemia-reperfusion in mammals is analyzed using function modules. The results show some preliminary evidences of estrogen's cardio-protective effect during ischemia-reperfusion and identify some pathways related to cardio-protective mechanism. There are three factors in this experiment: gender, gene (p-450 aromatase gene) knocked out or not and treatments (control, ischemia, ischemia-reperfusion). KEGG pathways are collected as function modules. The modules generating varied responses to different factor combinations are identified by various statistical tests. The biological knowledge-based microarray data analyses may produce results of greater biological significance. The quality of the analysis result depends on both the richness and the accuracy of the biological knowledge. We believe the biological knowledge-based analysis of microarray data will become more and more powerful once the biological knowledge become better known.
Keywords/Search Tags:microarray, expression profile, biological knowledge, Gene Ontology, clustering, gene set enrichment analysis
PDF Full Text Request
Related items