Font Size: a A A

A novel statistical method for microarray data integration: Applications to cancer research

Posted on:2008-06-04Degree:Ph.DType:Thesis
University:The Johns Hopkins UniversityCandidate:Xu, LeiFull Text:PDF
GTID:2444390005958111Subject:Biology
Abstract/Summary:
DNA microarray technology has had a tremendous impact on cancer research and microarray gene expression data has been widely used to identify cancer gene signatures which could complement conventional histopathologic evaluation to increase the accuracy of cancer diagnosis and prognosis. However, due to the limited sample size of individual studies, there is often only a small overlap between gene signatures for specific cancers obtained in different studies. With the rapid accumulation of microarray data, it is of great interest to understand how to combine microarray data across studies of similar cancers in order to increase sample size, which could lead to the identification of more reliable gene signatures of specific cancers.;The aim of this thesis is to propose a novel statistical method, based on the top-scoring pair(s) (TSP) classifier family, for inter-study microarray data integration, and to apply it to address two of the key problems in cancer research: identification of robust cancer diagnostic and prognostic signatures.;We develop the rank-based TSP classifier family, including the TSP, k-TSP, and GTSP classifiers. These classifiers only use the rank orders of gene expression values within each profile, and are therefore invariant to most data normalization methods. This property makes them extremely useful for integrating inter-study microarray data since these methods eliminate the need to perform data normalization and transformation. By incorporating the TSP method into different statistical models, we can identify robust gene signatures for cancer diagnosis and prognosis from integrated microarray data. To illustrate the efficacy of these models, we apply them to a large amount of microarray data, and have identified several important gene signatures for cancer diagnosis and prognosis. All these signatures are properly validated on independent microarray data.;Our work has not only established new models for the identification of robust gene expression signatures from accumulated microarray data, but also demonstrated how the great wealth of microarray data can be exploited to increase the power of statistical analyses. These models will be increasingly useful as more and more microarray data is generated and becomes publicly available in near future. With the inclusion of more samples, cancer gene signatures will be continuously refined and consensus signatures will finally be reached.
Keywords/Search Tags:Cancer, Data, Microarray, Gene, Statistical, Method, TSP
Related items