Font Size: a A A

Functional approaches for high dimensional gene expression data

Posted on:2009-12-26Degree:Ph.DType:Dissertation
University:University of California, DavisCandidate:Chen, KunFull Text:PDF
GTID:1448390002492585Subject:Biology
Abstract/Summary:
This research targets at problems in high-dimensional data spaces, particularly for high-dimensional gene expression data. Two specific problems are studied, and both involve functional data approaches.In the first problem, a time-course microarray data measured under two conditions from thousands of subjects is used to identify differentially expressed genes responsive to a condition change. We develop and employ the Functional Principal Component (FPC) approach to summarize the dynamics of the gene trajectories. Each gene trajectory is represented by a set of orthogonal basis functions, which reflect major modes of variation in the data are estimated from the data. The correlation structure of the gene expressions over time is also incorporated without any parametric assumptions and estimated by borrowing information from all genes. Estimation of the parameters is carried out by a very efficient hybrid EM algorithm. The proposed method is compared to standard two-way mixed ANOVA method with a real data and in simulation. With little model assumptions FPC analysis shows better sensitivity and specificity compared to the two-way mixed ANOVA.The second problem involves predicting survival time with high-dimensional covariates. We propose Stringing, an extension of Functional Data Analysis (FDA) in which one views high dimensional predictors as functional data. Stringing of high dimensional data is implemented with distance-based metric Multidimensional Scaling (MDS) and maps predictors to locations on a real interval such that predictors that are strongly correlated are located close to each other. Stringing thus generates a sample of random functions, one for each subject. The dimension of these high-dimensional random functions is then reduced by using methods from Functional Data Analysis. We study this new approach for the problem of predicting survival time from high-dimensional predictors. A novel functional Cox regression model is proposed and implemented by supervised iterative selection of predictor subsets. We compare Stringing with existing methods for survival regression with high-dimensional predictors, and demonstrate superior performance of Stringing in an application with gene expressions as predictors for survival time.
Keywords/Search Tags:Gene, Data, Dimensional, Functional, Survival time, Predictors, Stringing
Related items