Font Size: a A A

Deep Generative Model For Protein Secondary Structure Design

Posted on:2019-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:J ChangFull Text:PDF
GTID:2428330545952253Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
At present,the research of protein secondary structure is mainly about prediction,which means that we predict the protein secondary structure with the primary structure of the protein.In this paper,we propose a new research approach that we could generate a certain protein secondary structure sequence.The protein secondary structure generation is a theoretical innovation for the research of the protein secondary structure,which could also provide convenience for bioengineering and biopharmaceuticals.On the other hand,we could find a lot of research achievements of the deep generative models in terms of images and texts.However we could find few related studies on the generation of biological sequences.In our research,we choose the deep generative models to generate protein secondary structure.It is also an attempt and application for the deep learning models on the generation of biological sequences.In order to achieve the goal of generating the protein secondary structure,we make some contributions as follow:(1)We make a complete dataset of the protein secondary structure.At first,we download the protein data file in the database PDB,from which we extracte the secondary structure data.Then,we perform some different data preprocessing and encoding for different models.Finally,we get the training set that is suitable for our model.(2)We prove that the general LSTM is not suitable for generating the protein secondary structrues.We design and implement an LSTM network to perform the experiment,and we find that the repetition rate of the generated samples is high and the sample diversity is poor.Besides,the average precision rate is low and the variance of precision is high.So we regard this experiment as a comparative experiment.Furthermore,we need design a new and suitable model for the protein secondary structrue generation.(3)We propose a new algorithm named ssp-SeqGAN based on SeqGAN,which is used to generate the protein secondary structures.Similar to SeqGAN,ssp-SeqGAN is based on the model of GAN combined with reinforcement learning.SeqGAN is a general model to generate the discrete sequence that is not suitable for generating the protein secondary structrue sequence.At first,we improve the network structure of the discriminator and get a new model named SeqGAN-BN by adding the BN layer before the pooling layer and the full connected layer respectively.Then,we improve the pre-training method based on SeqGAN-BN and get a new model named ssp-SeqGAN,which could generate the protein secondary structures with higher precision and lower precision varience.The main contribution of ssp-SeqGAN is to propose a new negative samples construction method that is more diverse and adversarial,which improves the effect of pre-training obviously.The experimental results show that the precision of the protein secondary structure sequence generated by the general LSTM is only slightly higher than that of randomly generation method.The precision of the sequence generated by SeqGAN-BN is a little higher than that of SeqGAN,while the precision of the sequence generated by ssp-SeqGAN is much higher than that of SeqGAN.Besides,the precision variance of the sequence generated by ssp-SeqGAN is lower,which demencrates that ssp-SeqGAN could generate the protein secondary structrues with higher precision stably.In summary,we propose a deep generative model named ssp-SeqGAN for the design of the protein secondary structure.In the study of the protein secondary structure sequences generation,ssp-SeqGAN shows its effectiveness compared to the existing general sequence model LSTM and the model SeqGAN that is used to generate discrete sequences.
Keywords/Search Tags:Protein Secondary Structure, Deep Generative Model, RNN, GAN, Reinforcement Learning
PDF Full Text Request
Related items