Font Size: a A A

Meta analysis methods for microarray data and proteomics data

Posted on:2009-06-30Degree:Ph.DType:Dissertation
University:University of California, Los AngelesCandidate:Lin, WenFull Text:PDF
GTID:1448390005957926Subject:Biology
Abstract/Summary:
In recent years, the number of scientific articles studying microarray data and proteomics data has been increasing on an exponential scale. The abundance of data necessitates the development of meta analysis methods. Although meta analysis has been a subject of extensive studies for many years, not all traditional methods are suitable for microarray data or proteomics data. Microarray data have far more variables than samples, and its measurement reliability is limited by factors such as sample quality and lab protocol abiding. Microarray and proteomics analyses usually include some forms of feature variable selection which poses challenges for traditional meta analysis methods. We propose meta analysis methods that can be used for drawing general conclusions from multiple microarray data sets. Although we evaluate the methods on microarray data, we hope to adapt them to proteomics data as well.;In Chapter 1, we give a technological introduction to microarray and proteomics data. We present some initial results regarding analysis of proteomics data. In Chapter 3, we present a method that assigns each gene a reproducibility score that measures how consistent the expression level of a gene is across different studies. To calculate the reproducibility scores, we make use of correlations and weighted gene network terminology. Based on the reproducibility score, we calculate a weighted combined p-value for evaluating gene significance such that more reproducible genes are favored among genes with the same p-values. We provide evidence that the method of weighted combined p-values can lead to more meaningful results than traditional approaches that combine p-values across studies. In Chapter 4, we present our main meta analysis method: it uses gene co-expression information to calculate a meta analysis score for ranking genes. This meta analysis score is comprised of two parts: a gene specific part and a co-expression part. The relative weight of each part is controlled by a parameter delta. We use both real data and simulated data to evaluate the methods presented in Chapter 3 and Chapter 4. Although both the reproducibility score based method and the co-expression information based method lead to better validation rates compared to traditional methods, the improvement achieved by the latter is bigger. We find empirical evidence that incorporating co-expression information in meta analysis score increases validation rate. Methods presented in Chapter 3 and Chapter 4 are mainly developed for analyzing multiple independent microarray data sets, however, we expect that they also can be applied to other types of data such as proteomics data. In Chapter 5, we describe potential future research avenues.
Keywords/Search Tags:Data, Meta analysis, Chapter
Related items