Font Size: a A A

Statistical hypothesis testing and application to biological data

Posted on:2007-12-14Degree:Ph.DType:Thesis
University:University of California, BerkeleyCandidate:Birkner, Merrill DobbelFull Text:PDF
GTID:2448390005460307Subject:Biology
Abstract/Summary:
Recent developments in biological research, for instance genomics and proteomics, have created new statistical challenges for scientists. The notion of hypothesis testing is of utmost importance when determining genes which arc, for example, differentially expressed between two disease states. Using a multiple testing technique which accurately controls the false positive rate with adequate power is vital. This doctoral dissertation is focused on statistical hypothesis testing and is broken up into seven chapters.; Chapter 1 focuses on the concept of hypothesis testing and in particular provides a brief outline of multiple testing and the existing methodologies. Chapter 2 reviews a variety of multiple testing methods. In particular, this chapter will focus on the Pollard and van der Laan (2003) resampling based multiple testing method including a comparison to existing procedures.; Chapter 3 describes the newly proposed Empirical Bayes/TPPFP procedure, which has been found to be more powerful and less conservative as compared to existing multiple testing procedures, based on several simulation results.; The next two chapters illustrate an application of the multiple testing procedure on two biological datasets. Chapter 4 describes an HIV-1 dataset which consists of reverse transcriptase and protease codon positions and an outcome of replication capacity. The goal of this analysis is to determine which codon mutations are univariately associated with viral replication. Chapter 5 describes a proteomic dataset which consists of mass spectrometry data from both acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) patients. The goal of this analysis is to correctly preprocess the spectral data and subsequently apply the multiple testing procedures to determine which proteins are differentially expressed between the two leukemia subtypes.; A general hypothesis technique is described and explored in Chapter 6. This general pathway testing procedure is applied to the previously described HIV-1 dataset. This procedure will determine the significance of the specific model, which represents the relationship between viral codons (protease and reverse transcriptase) and replication capacity. Finally, Chapter 7 provides a conclusion and summary of the preceding chapters.
Keywords/Search Tags:Testing, Biological, Statistical, Chapter
Related items