Font Size: a A A

The Research On The Application Of Deep Learning In The Predication Of The Secondary Structure Of Protein

Posted on:2016-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:A S ZhangFull Text:PDF
GTID:2180330461992484Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the completion of Human Genome Project(HGP), the focus of biological science turns from Genomics to Proteome and human enter into the Post-Genome era. The research of Proteome focus on the structure and function of protein, and the structure of protein determines its function. So the research of protein structure becomes the focus of Post-Genome era. Besides, due to the smooth development of the work of varies protein sequences, more and more primary structures of protein (the Amino acids sequencing of the protein) have been measured and been stored in the major Biological databases, the data of which grows in the form of index while the measurement of protein spatial structure develops slowly due to the impact of various factors, which results in the fact that the primary structure of more and more proteins has been known among public, not the its spatial structure. In this case, with the help of computer technology, the prediction of protein structure has been proposed and been used widely.In the study, it is difficult to predict the spatial structure of protein directly from its primary structure. Therefore, the secondary structure of protein has been proposed. As a transition between the primary structure of protein and its spatial structure, the secondary structure is be used to describe the local spatial structure of protein. And the prediction of protein structure is divided into two directions:from primary structure to secondary structure and from secondary structure to spatial structure. The former is the focus of the research. The main point in this paper is to predict the secondary structure of protein from its primary structure.The prediction of protein is divided into two stages:coding stage and the stage of model prediction. The first one can also be called feature coding or feature extraction. According to a certain coding people encode the protein sequence in fixed-length Eigenvectors. The second stage is to construct a appropriate predictive model and use the encoded protein sequence to arrange training and prediction. This paper will improve it in two aspects and propose a predictive way of protein based on deep learning.In the stage of feature coding of protein, this paper proposes a feature extraction way of protein based on Pseudo Amino Acid Composition(PseAA), which encodes the protein sequence into a Eigenvectors of 30-D. The Eigenvectors in it includes not only the Amino Acid Composition(AAC) of protein, but also includes the approximate entropy feature of describing the sequence local information, meanwhile includes hydrophobicity pattern feature of describing physical and chemical properties of sequence as well as the image-based Homology feature of describing protein sequence.This prediction model in this paper is a normal model based on deep learning: Deep Belief Nets, which is consisted of 5 layers DBM and a classification Nets. Each adjacent two layers consist a Restricted Boltzmann Machine(RBM), so totally 4 RBMs, and the classification of it adopts the softmax classifier. The training of the prediction model includes two processes:the pretraining stage of bottom-up and the fine-tuning stage of up-bottom. The former is a process of Unsupervised Learning and adopts the way of Greedy layer-wise training and make each RBM get the training from bottom up, while the latter adopts the way of Error Back Propagation(BP) to adjust the net parameters from up down.The results shows that:The prediction model proposed in this paper possesses the good feasibility and effectiveness. Especially for the lower protein sequence of Homology, the prediction accuracy can be comparable with the best current predictive model.
Keywords/Search Tags:deep learning, protein structure prediction, prediction of secondary structure, protein structural class, pseudo amino acid, deep belief network
PDF Full Text Request
Related items