Font Size: a A A

Study On Predicting Nucleotide-Binding Protein Using Deep Learning Approach

Posted on:2018-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:H S LiFull Text:PDF
GTID:2310330542977405Subject:Computer technology engineering
Abstract/Summary:PDF Full Text Request
DNA-binding proteins and RNA-binding proteins(RNA-BPs)play pivotal roles in DNA replication,transcription,alternative splicing,RNA editing,methylating and many other biological functions.Predicting functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotation of genomes.With the development and application of high-throughput technologies,protein data is exponentially growing in biological database.More and more scholars and research devote themselves to discovering and exploring the significance of life from biological data.Recently,many computational methods have been proposed to predict whether a protein containing the DNA-binding or RNA-binding function.But traditional prediction methods often devote themselves to extracting physicochemical features from sequences but ignoring motif information and location information between motifs.Meanwhile,the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions.In this paper,our work are as follows:(1)Many traditional feature extraction methods are analyzed firstly,such as physicochemical feature extraction from amino acids,n-gram method from sequence,a n-gram method after amino acids classification,a method combining physicochemical feature extraction and n-gram,an auto covariance method which is based on amino acids physicochemical features.Then some traditional machine learning methods were used to predict whether a protein sequence contains DNA-binding or RNA-binding function.(2)We propose a new deep learning based model to predict RNA-binding proteins from primary sequences secondly.The model utilizes two stages of convolutional neutral network to detect the function domain of protein sequences,and long short-term memory neural network to obtain the length-fixed feature representation of sequences.It overcomes more human intervention in feature selection procedure than in traditional machine learning method,since all features are learned automatically.The experimental results show its priority in processing large scale of sequence data.
Keywords/Search Tags:DNA-binding protein, RNA-binding protein, Convolutional Neural Network, Long-Short Term Neural Network, Deep Learning
PDF Full Text Request
Related items