Font Size: a A A

Active learning for sequential screening and classification of molecular and genomic data

Posted on:2008-11-01Degree:Ph.DType:Dissertation
University:Boston UniversityCandidate:Walker, Megon JarmaineFull Text:PDF
GTID:1448390005477934Subject:Biology
Abstract/Summary:
Understanding molecular interactions is at the core of computational biology and includes problems such as characterizing protein-protein, protein-small molecule, protein-DNA, and Protein-RNA binding events. These interactions are often elucidated by expensive and time-consuming assays during which candidate binders are screened against a target. The main aim of this dissertation is to improve the speed, cost, and overall efficiency of screening assays in the context of drug design and molecular systems biology.; Sequential screening is an iterative process of experimentation and model refinement. Target binding activity is determined for samples of putative binders, results are used to update a classification model, and subsequent binding experiments are performed based on knowledge gained from previous screens. The incorporation of machine learning strategies for designing experiments and for building a predictive binding model can dramatically reduce the experimental costs and validation time required for sequential screening.; This dissertation evaluates the application of different active learning paradigms to sequential screening of molecular and genomic interactions. Active learning couples data acquisition with model building instead of treating them separately; as in supervised learning of a classifier from a static training set.; During virtual screening of compounds for protein-small molecule interactions, structure-activity relationships determined from characterized compounds inform further compound selection. In order to improve structure-activity models, knowledge-based sampling for subgroup discovery and biclustering for class-specific feature selection are introduced in the drug discovery context. Both local pattern recognition methods can be adapted for iterative model refinement and for delineation of important functional groups.; RNA interference is a promising new technology with potential to uncover novel disease pathways and to guide target-based therapeutic efforts. During sequential screening for highly effective small interfering RNAs, active learning strategies reduce the experimentation required to achieve full target coverage by more than 50% compared to random screening.; Application of active learning to improve molecular and genomic sequential screening demonstrates the practical utility and flexibility of the method. Several real-world datasets were analyzed using various classifiers, bagged and boosted ensembles, and sample selection strategies. Active learning consistently outperforms random screening, underscoring the importance of using previously determined conclusions to guide subsequent experimentation.
Keywords/Search Tags:Active learning, Screening, Molecular, Interactions
Related items