Concordant Integrative Analysis of Multiple Gene Expression Data Sets

Posted on:2015-07-13

Degree:Ph.D

Type:Dissertation

University:The George Washington University

Candidate:Zhang, Fanni

Full Text:PDF

GTID:1470390020450729

Subject:Statistics

Abstract/Summary:

Microarray is an experimental method by which tens of thousands of genes can be printed on a small chip. This technology enables us to measure genome-wide expression profiles. The cost of a microarray experiment is still relatively high. Therefore, the sample size of a microarray experiment is still relatively small. For some important disease studies, microarray data have been collected by different laboratories. We expect to obtain more efficient analysis results if different data sets collected for the same or similar study can be integrated. However, due to many complicated experimental issues, it is necessary to evaluate the genome-wide concordance among these data sets before their integrative analysis. If the underlying behavior of a gene is consistent among different experiments, then the related expression profiles in different data sets will be concordant. Statistically, mixture models have been widely used to accommodate unobserved heterogeneities in a study population. A mixture model based method has been proposed for the integrative concordant analysis when there are two microarray data sets available for an integrative analysis. It is necessary to extend this approach for an integrative analysis of multiple data sets.;The general statistical framework for our integrative analysis is the partial concordance/discordance (PCD) model. Its related statistical estimation difficulty is that its parameter space increases exponentially with the number of data sets. Since the complete concordance model (CC) and the complete independence (CI) model are two basic statistical frameworks that can be derived from the PCD model, we propose a two-level mixture model to approximate the PCD model. It combines the basic CC and CI models and its parameter space increases linearly with the number of data sets. We have implemented an expectation-maximization algorithm for the model parameter estimation. Simulation studies have been conducted to understand the performance of our method. We have also applied our method to a collection of microarray gene expression data sets for a lung cancer study.;Furthermore, we have also developed other approaches to decrease the parameter space of PCD model by simplifying the non-diagonal proportion parameters. The inspiration comes from the exchangeable structure and AR(1) structure in GEE, as well as the multiset coefficient in combinatorics. We still consider expectation-maximization algorithm to achieve the model fitting. The performance of the proposed methods is examined using simulation studies. We have also compared these methods with the two-level mixture model based method through applications to the same experimental data sets from the lung cancer study.

Keywords/Search Tags:

Data sets, Integrative analysis, Model, Method, Gene, Expression, Experimental, Microarray

Related items

1	Research On Gene Expression Data Analysis Method And Its Application
2	Study On Statistical Methods For Analyzing Gene Expression Microarray Data Under Mixed Linear Model Framework
3	Research On Feature Analysis Methods In Microarray Gene Expression Data
4	CePa:a New Method To Identify Significant Gene Sets And The Construction Of Online Data Analysis Platform
5	The Improvement In The Method Of Multi-platform Microarray Data Integration
6	Finding Differential Gene Expression Using Probabilistic Methods
7	A Study On Some Issues About Gene Expression Data Analysis
8	Methods for cluster analysis and validation in microarray gene expression data
9	Study On Methods For Microarray Data Analysis Based On Mixed Linear Model Approach And Conditional Variable Analysis
10	Statistical Methods For Microarray Data Analysis