Font Size: a A A

On using block principal component analysis for reducing gene-expression data dimensions

Posted on:2006-01-01Degree:M.SType:Thesis
University:University of Nevada, Las VegasCandidate:Lee, Sang HeeFull Text:PDF
GTID:2458390008955862Subject:Statistics
Abstract/Summary:
Since a microarray gene expression database contains a large number of variables and a relatively small number of samples, using and analyzing the databases require an intense, large-dimension computation method. Principal component analysis (PCA) is a useful tool to reduce the number of dimensions, and therefore, the complexity. PCA allows us to analyze the gene expression database with a relatively small data dimension without losing relevant information and increases the analytic visibility of the data. The initial computation using PCA, however, involves calculating a high-dimension covariance or correlation matrix and requires time and hardware resources which are limited in most real situations.;In this thesis, we propose to use a Block Principal Component Analysis (Block PCA) method, introduced by Liu et al. (2002), to produce a subset that can explain a large amount of variation and propose criterion to find the most appropriate subsets.;The gene expression data typically is highly correlated and the covariance matrix becomes highly ill-conditioned. The Mahalanobis distances resulting from the application of software packages such as SAS are not reliable in such cases. We investigate the effect of ill-conditioning on Discriminant Analysis of gene expression data from a DNA microarray. Bioinformatics literature recommends forming blocks of variables that are correlated with another. We proposed the method of Partial Least Square (PLS) to form the block of correlated variables for use in Block PCA.
Keywords/Search Tags:Principal component analysis, Block, Data, Gene, Expression, PCA, Using, Variables
Related items