Font Size: a A A

Automatic annotation of multiple protein sequence alignments using recurrent neural networks

Posted on:2007-01-23Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Aggarwal, AdityaFull Text:PDF
GTID:2458390005988347Subject:Computer Science
Abstract/Summary:
Manual annotation of multiple sequence alignments for phylogeny is a time consuming and non trivial task. Especially in the genomic-scale analyses, manually annotating hundreds or thousands of sequence alignments is not a practical option. To automate this process, we present the application of three architectures of neural networks namely, multilayer feed forward network, recurrent neural network and bidirectional recurrent network to detect regions of intrinsically poor alignments. Parameters were generated on a set of manually annotated Pfam multiple protein sequence alignments, forming the training and testing data for the networks. The system is designed to capture noisy sites (i.e. inadequate class) and informative sites (i.e. the valid class). Of the three architectures multilayer feed forward network with no window size provided the highest classification precision for the valid class at 94.6%. The best performance for the prediction of inadequate sites occurred using the bidirectional recurrent neural network 92.78%. The different classifiers have the ability to annotate multiple sequence alignment for the purpose of editing. This method is especially useful as a pre-processor for phylogenetic analyses at the genomic scale.
Keywords/Search Tags:Sequence alignments, Multiple, Recurrent neural, Network
Related items