Font Size: a A A

Mixture Models for Gene Expression Experiments with Two Species

Posted on:2011-07-29Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Su, YuhuaFull Text:PDF
GTID:1444390002968025Subject:Biology
Abstract/Summary:
A bivariate mixture model utilizing information across two species is proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments. Orthologs, or genes from two different species that originated from a common ancestor, have the potential to exploit similarities between species to better understand the genetic basis of disease and treatment. The proposed approach intuitively models the distribution of the estimated treatment effects with minimal assumptions. The mixture model posits up to nine components, four of which include groups in which genes are differentially expressed in both species. An EM algorithm is developed to accomplish the nontrivial likelihood maximization, along with methodology for handling singular covariance matrices that arise during the implementation of the algorithm. A comprehensive simulation to evaluate the model performance and two applications on real world data sets, a dog and human lymphoma data set prepared by a group of scientists in the College of Veterinary Medicine at North Carolina State University and a mouse and human type II diabetes experiment sponsored by GlaxoSmithKline, suggest that the proposed model, though highly structured, can handle various situations and is practically useful, especially when the magnitude of differential expression due to the different treatment intervention is weak. In both applications, the proposed 9-component mixture model is able to eliminate unimportant genes and identify a list of genes that are potential candidates of biomarkers. Though the primary motivation for the development of the bivariate mixture model is to enable identification of genes whose differential expression extends from humans to another species, possible extension to classification/prediction of cancer type or drug response is also initiated in the two case studies. In the dog and human lymphoma study, a very small number of genes are identified as being differentially expressed in both species and the human genes in this cluster serve as a good predictor for classifying diffuse large-B-cell lymphoma patients into two subgroups, the germinal-center B-cell-like diffuse large B-cell lymphoma and the activated B-cell-like diffuse large B-cell lymphoma. Additionally, the two subgroups defined by this cluster of human genes have significantly different survival functions, indicating that the stratification based on gene-expression profiling using the proposed 9-component mixture model provides better insight into the clinical differences between the two types of cancer. The application of the 9-component mixture model on the mouse and human type II diabetes experiment is less successful. While the mixture model is able to separate differentially expressed genes from those non-differentially expressed ones, attempts at predicting human drug response status using the genes identified as being differentially expressed in both species did not lead to the same success as the lymphoma experiment. This may be due to the fact that there is little evidence of any differential expression. The linear model for week 8 expression in human genes was one of many possible models, but it did not uncover much evidence of a treatment effect. Nonetheless, a potential multi-gene predictor may still be developed according to the genes identified by the proposed 9-component mixture model to benefit patients in therapeutic decision making.
Keywords/Search Tags:Mixture model, Species, II diabetes experiment, Expression, B-cell-like diffuse large b-cell lymphoma, Differentially expressed, Genes identified, Type II
Related items