Font Size: a A A

Study On Methods For Microarray Data Analysis Based On Mixed Linear Model Approach And Conditional Variable Analysis

Posted on:2004-07-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:1100360092485514Subject:Crop Genetics and Breeding
Abstract/Summary:PDF Full Text Request
Microarrays are becoming increasingly more common laboratory tools for studying simultaneous changes in expression across a large number of genes. Recent developments in microarray technology make it possible to capture the gene expression profiles for thousands of genes at once. With this kind of data, researchers are tackling problems ranging from the identification of "cancer genes" to the formidable task of adding functional annotations to our rapidly growing gene databases. Given the high-dimensionality (thousands of genes) and small sample sizes (often <30) encountered in these datasets, an honest assessment of sampling variability is crucial and can prevent the over-interpretation of spurious results. Substantial systematic and stochastic fluctuations are involved in microarray experiments. Cluster analysis and related techniques are proving to be very useful to explore highly correlated patterns of gene expression. However, such exploratory methods alone do not provide the opportunity to engage in statistical inference and to provide results with biological sense, especially they are not fit to analyze the dynamic gene expression data which are highly correlated between time scries.We describe a statistical framework that encompasses many of the analytical goals in gene expression analysis; our framework is completely compatible with many of the current approaches and, in fact, can increase their utility. The present study has focused in the identification of differentially expressed genes in microarray data. A method for microarray data analysis based on mixed linear model approach is proposed. This method has been applied to the identification of differentially expressed genes and prediction of gene main effect and gene by environment interaction effect both in a static statement and a developmental process. Computer simulations are used to investigate the efficiency and reliability of such method under a wide range of situations. This method promises to shed light on the utilization of dividing gene expression level into several components due to various variance sources. The main results and issues are summarized as follows:1. A general genetic model for microarray data is developed, which includes effects of gene, array, dye, treatment and interaction of gene array, gene dye, and gene treatment. Proper adjustment can be made for the terms of the model according to the varied experimental project. In this paper, our method was performed in two separate steps. First, microarray data was normalized to eliminate experiment-wide systematic effects and then differentially expressed genes were prejudged under a loose standard via a single gene model. Second, these differentially expressed genes were confirmed under a stricter standard to control the false positive via a multi-gene model. Variance andcovariance components of each effect were estimated by minimum norm quadratic unbiased estimation (MINQUE) method. Adjusted unbiased p rediction (AUP) procedure w as suggested for predicting random effects. Gene treatment interaction is proposed as a measure used in identification of differentially expressed gene.2. Monte Carlo simulations are conducted to study the presented method for microarray data analysis based on mixed linear model approach. The results indicate that such method can be more effective than t-test approach and Wolfinger's mixed model approach in a large of situations. These results have provided strong evidences suggesting gene treatment interaction as a more appropriate measure used in identifying differentially expressed genes.3. The present study has shown that our method based on mixed linear model approach can predict random effects and estimate fixed effects unbiased or asymptotic unbiased. The unbiased predicted value or estimated value of gene main effects and gene treatment interaction can be used as precursor to clustering to make sure the inputs are statistically meaningful and of biological interest.4. In the present study we extend our mixed...
Keywords/Search Tags:DNA microarray gene expression, mixed model, Monte Carlo Simulation conditional variable
PDF Full Text Request
Related items