On using block principal component analysis for reducing gene-expression data dimensions

Posted on:2006-01-01

Degree:M.S

Type:Thesis

University:University of Nevada, Las Vegas

Candidate:Lee, Sang Hee

Full Text:PDF

GTID:2458390008955862

Subject:Statistics

Abstract/Summary:

Since a microarray gene expression database contains a large number of variables and a relatively small number of samples, using and analyzing the databases require an intense, large-dimension computation method. Principal component analysis (PCA) is a useful tool to reduce the number of dimensions, and therefore, the complexity. PCA allows us to analyze the gene expression database with a relatively small data dimension without losing relevant information and increases the analytic visibility of the data. The initial computation using PCA, however, involves calculating a high-dimension covariance or correlation matrix and requires time and hardware resources which are limited in most real situations.;In this thesis, we propose to use a Block Principal Component Analysis (Block PCA) method, introduced by Liu et al. (2002), to produce a subset that can explain a large amount of variation and propose criterion to find the most appropriate subsets.;The gene expression data typically is highly correlated and the covariance matrix becomes highly ill-conditioned. The Mahalanobis distances resulting from the application of software packages such as SAS are not reliable in such cases. We investigate the effect of ill-conditioning on Discriminant Analysis of gene expression data from a DNA microarray. Bioinformatics literature recommends forming blocks of variables that are correlated with another. We proposed the method of Partial Least Square (PLS) to form the block of correlated variables for use in Block PCA.

Keywords/Search Tags:

Principal component analysis, Block, Data, Gene, Expression, PCA, Using, Variables

Related items

1	Research On Dimensionality Reduction Of Gene Expression Data Based On Traditional Feature Extraction And Deep Learning
2	Application Of Adaptive Principal Component Extraction To Gene Expression Data
3	Research On Classification Of Gene Expression Data Based On Adjacency Matrix Decomposition
4	Research Of Support Vector Machine For The Analysis Of Gene Expression Data
5	Using principal component analysis (PCA) to obtain auxiliary variables for missing data in large data sets
6	Research And Realization Of Face Recognition Algorithm Based On Block PCA
7	Study Of Gene Expression Data Analysis Based On Pattern Recognition Methods
8	Data Analysis Of Cancer Gene Expression Based On SVM-RFE Algorithm
9	Research On Classification Algorithms In Gene Expression Data Analyzing
10	Research On Biclustering Methods For Gene Expression Data Analysis