Font Size: a A A

Cloud-scale Genomic Signal Processing for Robust Microarray Data Analysis

Posted on:2016-08-24Degree:D.SType:Dissertation
University:Bowie State UniversityCandidate:Harvey, Benjamin SFull Text:PDF
GTID:1478390017478037Subject:Computer Science
Abstract/Summary:
As microarray data available to scientists continues to increase in size and complexity, it has become overwhelmingly important to find multiple ways to bring forth oncological inference to the bioinformatics community through the analysis of Large-Scale Cancer Genomic (LSCG) DNA and mRNA microarray data that is useful to scientists. Though there have been many attempts to elucidate the issue of bringing forth biological interpretation by means of wavelet pre-processing and classification, there has not been a research effort that focuses on Cloud-Scale Distributed Parallel (CSDP) one-dimensional wavelet decomposition and denoising through differential expression thresholding and parameterized classification of LSCG microarray data. This study presents a novel methodology that utilizes a CSDP one-dimensional wavelet based denoising to initialize a threshold for elucidation of significantly expressed genes for classification of cancer disease susceptibility. Additionally, this research introduces a "Tipping Point" gauge (TPG) for adaptive selection of optimal wavelet decomposition levels. Finally, this study utilizes the TPG's image quality performance values for machine learning parameterization in optimal classification. The overall study was implemented and encompassed within CSDP environment. The utilization of cloud computing and wavelet-based thresholding was used for the classification of samples within three LSCG data sets from the Global Cancer Map (GCM), Cancer Cell-Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA). The results proved that one-dimensional parallel cloud-scale wavelet denoising, differential expression thresholding, TPG adaptive level selection and machine learning parameterization increased the computational performance and enabled the generation of higher quality LSCG microarray images, which led to more accurate classification results.;The purpose of this investigation is to elucidate a novel Cloud-scale Distributed Parallel (CSDP) approach for extracting important, but hidden patterns in Large-Scale Cancer Genomic (LSCG) microarray data sets that can be utilized for patient classification. The main objectives of this study is to investigate a gauged machine learning approach to Genomic Signal Processing (GSP) thresholding and denoising to enhance the quality of LSCG gene expression microarrays for cloud-scale analysis. GSP techniques were utilized to transform preprocessed coefficients within a CSDP environment while applying image quality error and correlation properties as measures for the robustness of the analysis. Hence, utilizing CSDP signal preprocessing as a "Tipping Point" gauge for threshold determination in the analysis of trade-offs between gene microarray noise, error and correlation in large-scale expression microarrays machine learning classification as a measure of robustness is the central focus of this study.;Though there have been many attempts to elucidate the issue of determining the balance between microarray gene resolution (MGR) and accuracy, there has not been an analysis of the "Tipping Point" in regards to the determining an optimum threshold for identifying the levels of gene expression within a CSDP environment. As a result, this dissertation proposes a novel method that focuses on a robust parallel, distributed, cloud-based denoising methodology using parallel Wavelet transformation for identifying of a threshold to retain significantly expressed features. This innovative method produced a wavelet based one-dimensional technique for attaining significant features of gene expression in a distributed Cloud environment, and provided a predictive model of microarray data by analyzing gene patterns, which inform us of biological processes.;This study utilizes parallelization as mechanism to increase the robustness of denoising, thresholding and classification by distributing the microarray data across processors during decomposition. Moreover, this study will enable researchers to face the present and forthcoming challenges that may arise in cancer research by utilizing the proposed CSDP GSP analysis for LSCG microarray datasets by facilitating timely, effective, and efficient analysis of data and functional genomics. Furthermore, the study establishes a mechanism for the generation of optimal microarray datasets for precise genetic and genomic tumor/cancer classification.;Ultimately, this dissertation presents theoretical and empirical work implementing a CSDP environment for implementing various Genomic Signal Processing methodologies using machine learning to determine the robustness of the strategies. This study also presents a resourceful approach in the use of parallel wavelet decomposition and thresholding for denoising genes in microarray data sets, and an innovative conceptual framework for a hybrid virtualized public and private cloud model for distributed, scalable, and parallel processing.
Keywords/Search Tags:Microarray data, Genomic signal processing, Cloud, CSDP environment, Parallel, Machine learning, Classification, Distributed
Related items