Font Size: a A A

Computational mutagenesis using transduction, active learning, and association rule mining

Posted on:2013-05-19Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Basit, NadaFull Text:PDF
GTID:1458390008483109Subject:Biology
Abstract/Summary:
Wet laboratory mutagenesis to determine enzyme mutant activity or nsSNP-induced pathology is expensive and time consuming. Automating such prediction tasks motivates in silico computational methods, i.e., computational mutagenesis. The computational methods used in this dissertation are driven by transduction, active learning, and association mining. The specific bioinformatics tasks are linked with the novel computational mutagenesis methods as follows: (1) protein function prediction using transduction; (2) protein function prediction using transduction and active learning; and (3) prediction of nsSNP-induced pathology using transduction and active learning combined with association mining. The feasibility and comparative advantage of these methods are shown on predicting mutant (single amino acid polymorphisms) activity for HIV-1 Protease (HIV-1), Bacteriophage T4 Lysozyme (T4), and Lac Repressor (LAC) proteins; and on predicting non-synonymous Single Nucleotide Polymorphism (nsSNP)-induced pathology on an nsSNP data set composed of a large number of proteins. The problem of unbalanced population, where the proportion of examples in the data set belonging to each class is uneven, is addressed using (a) stratified sampling with cross-validation operating on folds that are identical in class distribution; and (b) random over-sampling to boost the minority class and make it equal in size to the majority class. The annotation problem is a by-product of incremental transduction and active learning. The novel methods proposed in this dissertation perform better than state-of-the-art methods in terms of prediction performance (Tasks 1, 2, and 3), amount of annotation used (size of training data) (Tasks 2 and 3), and explanation (knowledge) gained (Task 3).
Keywords/Search Tags:Active learning, Using transduction, Mutagenesis, Tasks, Prediction, Association
Related items