Font Size: a A A

Finding Differential Gene Expression Using Probabilistic Methods

Posted on:2011-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2120330338476257Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Affymetrix microarrays are currently the most widely used microarray technology. Finding differ-entially expressed genes is a fundamental objective of a microarray experiment, and plays an importantrole in genetic diagnosis, medical treatment, screen drugs, and so on. Microarray experiments are a com-plicated multiple-step procedure and variability exists in every step of the experiment. This make thegenerated data very noisy. And the small number of replicates for each condition leads to a low precisionprobe-level measurement and an inaccurate variance estimate for each gene across replicates. The twomain reasons cause finding differentially expressed genes very difficult. Many methods have been de-veloped to combined replicated gene expression measurement for finding differentially expressed genes.The widely used Affymetrix GeneChips use multiple probes to interrogate gene expression profiles andthis provides rich information about technical sources of noise. Probability is a natural representation ofuncertainty and is suitable for describing the noisy nature of gene expression data. Recently proposedprobabilistic method, PPLR, considers both gene expression values and probe-level measurement error,and improves the accuracy in finding differential gene expression. However, PPLR uses the importancesampling procedure in the variational EM algorithm, which leads to less computational efficiency. Thisthesis modified the original PPLR model to obtain an improved model, IPPLR.IPPLR model uses the Bayesian hierarchical framework and considers gene expression values anduncertainly. IPPLR adds hidden variables to represent the true gene expressions. Variational EM al-gorithm is used for estimating model parameters. IPPLR can obtain the standard distributions of theall hidden variables. This eliminates the important sampling procedure in PPLR. In order to validateIPPLR, results from both a benchmark Golden Spike-in dataset and a real-world Mouse Embryo datasetdemonstrate that IPPLR can improve the accuracy and computational efficiency in finding differentiallyexpressed genes. In order to further validate the computational efficiency on the large datasets, MouseHair and Mouse Colitis dataset are chosen to compare IPPLR with PPLR. Results demonstrate IPPLRcan improve computational efficiency obviously, especially when the number of chips increases.IPPLR has been implemented in an R package, ipplr, which is currently available from http://parnec.nuaa.edu.cn/liux/zhangl, and has been merged into the puma package, whichwill be released with the next version of Bioconductor.
Keywords/Search Tags:microarray, gene expression data analysis, differentially expressed gene, probabilisticmodel, variational approximation method, computational efficiency
PDF Full Text Request
Related items