Font Size: a A A

Identification Of Protein Coding Region Based On Artificial Neural Networks

Posted on:2019-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2370330572452117Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Protein is an important component of living begings and is the main undertaker of life activities.How to find the protein coding region from the gene sequence is crucial for us to study life activities.There are differences in the length of the different genes,The number of coding regions each gene contains,and the length of the coding region,meanwhile the characteristics of the coding region and the non-coding region are not clear,which gives us great difficulty in the identification of protein coding regions.Based on this,a recognition model of protein coding region based on artificial neural network is proposed.The selforganizing ability of neural network is used to extract the features of the coding and noncoding regions of known proteins automatically.And then we can use those features to identify coding regions in unknown genes.This paper proposes six solutions for the identification of protein coding region.The total can be divided into two types of models: one is the recognition model of protein coding region based on three network structures of MLP,CNN,and RNN,and the other is the integrated recognition model of protein coding region based on voting,re-learning and model merging.Firstly,this paper proposes a recognition model of protein coding region based on MLP.After theoretical analysis and experiment,we have selected a model structure with a hidden layer.Secondly,CNN can extract the main features of the sample and reduce the model parameters through operations such as weight sharing and pooling.We proposes a recognition model of protein coding region based on CNN with two convolutional layers and two pooled layers.Third,according to gene sequence has the same characteristics as time series,and the RNN can be used to deal with the problem of time series well,so a recognition model of protein coding region based on RNN is proposed.Finally,in order to improve the accuracy of the identification of protein coding regions,three integrated recognition model for protein coding region based on voting,re-learning,and model consolidation are constructed while using the differences models of MLP,CNN,and RNN.By comparing the accuracy,reliability and running time of the three recognition models of protein coding region with MLP,CNN and RNN,it can be found that the RNN takes the longest time to identify the coding region,but the accuracy is the best.The performance of the three integrated models combined with MLP,CNN,and RNN is better than that of each basic recognition model.The accuracy rates of the three integrated models are 90.84%,90.72%,and 89.99%,demonstrating the effectiveness of the integrated model.
Keywords/Search Tags:MLP, CNN, RNN, Integrated Learning, Protein Coding Region
PDF Full Text Request
Related items