Font Size: a A A

Improving statistical methods in biological pathway analysis

Posted on:2011-12-26Degree:Ph.DType:Dissertation
University:Southern Methodist UniversityCandidate:McClellan, Elizabeth AFull Text:PDF
GTID:1444390002454635Subject:Statistics
Abstract/Summary:
The integrated analysis of genetic and biological pathway data is crucial to the understanding of systems biology. The components of biological processes are affected by the levels of expression of genes, which control the production of proteins. The presence or absence of specific proteins leads to disruptions in metabolic or signaling pathways, which affects the stability of an organism's biological system. Identifying where differentially expressed genes control the behavior of reactions or signals in pathways is a computational and biologically complex statistical challenge.;A plethora of statistical methods are available to quickly ascertain which genes studied in an experiment are differentially expressed (DE) between varying biological conditions. DE genes can then be mapped to biological pathways or networks to discover where they influence reactions and signals in pathways. Statistical methods developed in attempts to make such discoveries determine where DE genes are over-represented in pathways, but unfortunately do not generally acknowledge the structure of these pathways. This omission of biologically relevant information is a crucial mistake made by the statisticians who develop the methods and the biologists who use them. Over-representation methods also exhibit a sample size bias in that small p-values are not easily obtained for pathways involving few genes. Additionally, valuable information is lost when gene p-values, z-scores, fold changes, or other measures are divided into dichotomous groups. The direction and magnitude of a gene measure provide more evidence of true differential expression and should be included in any pathway analysis.;This dissertation summarizes several existing pathway analysis methods, including over-representation methods, methods that utilize other available gene measures, and methods that incorporate pathway structure. Because the pathways analyzed in most methods are subjectively defined by their starting and stopping points, and thus may contribute to incorrect results, a proposed reconstruction of pathways is defined. A new method called Weighted-Averages for Reconstructed Pathways (Path WeAveRs) is introduced for use on the newly rearranged "pathways". An investigation of the statistical performance and biological relevance of Path WeAveRs as compared to other methods is carried out. Results indicate Path WeAveRs produces biologically meaningful pathways and is a viable alternative to existing pathway analysis methods.
Keywords/Search Tags:Biological, Methods, Pathway
Related items