As a research hotspot in the field of computational bioinformatics,protein secondary structure prediction is undoubtedly a challenging and important task.Protein secondary structure prediction is not only beneficial to protein function research and tertiary structure prediction,but also promotes the design and development of new drugs,and has an important impact on biology,medicine and pharmacy.Since physical experimental methods for protein secondary structure prediction are time-consuming and expensive,continuously evolving and improved machine learning techniques are widely used for secondary structure prediction.However,current existing methods with deep architectures are insufficient and comprehensive for deep long-range feature extraction in long protein sequences.To this end,this thesis conducts an in-depth study on 3-state and 8-state protein secondary structure prediction based on integrated deep models.The main work of this research can be divided into the following aspects:(1)This study combines Wasserstein generative adversarial network with gradient penalty(WGAN-GP),temporal convolutional network(TCN)and convolutional attention block to propose a protein secondary structure prediction model named WGACSTCN.In the proposed model,the mutual game between the generator and the discriminator in the WGAN-GP module can effectively extract protein features,and the TCN with convolutional attention block(CBAM-TCN)local feature extraction module can capture the key deep local interactions in protein sequences segmented by sliding window technique,and then the CBAM-TCN long-range feature extraction module can further capture the key deep long-range interactions in the protein sequence and complete the secondary structure prediction.This study evaluates the performance of the proposed model on public test sets including CASP10,CASP11,CASP12,CASP13,CASP14 and CB513.Experimental results show that the proposed method exhibits better prediction performance compared to four popular methods.The proposed model has strong feature extraction ability,which can extract important residue information more comprehensively.(2)This study proposes a protein secondary structure prediction model based on deep learning and broad learning named DLBLS_SS.In DLBLS_SS,we first use a bidirectional long short-term memory(BLSTM)network to extract global features in residue sequences,and then use the proposed bidirectional TCN with channel attention(SEBTCN)to capture bidirectional key long-range dependencies in residue sequences,and finally we also use the broad learning system(BLS)to quickly optimize the fused features while further capturing the local interactions between amino acid residues and utilizing the locally optimal features for classification.We conduct extensive experiments on seven public benchmark datasets to evaluate the performance of the model.Experimental results show that the proposed model DLBLS_SS outperforms five stateof-the-art models.(3)This study proposes an integrated deep model based on BLSTM and bidirectional temporal convolution for protein secondary structure prediction.In the model,the proposed bidirectional TCN(BTCN)can extract bidirectional deep local dependencies in protein sequences segmented by sliding window technology,BLSTM can extract global interactions between amino acid residues,and the proposed multi-scale BTCN(MSBTCN)can further capture the bidirectional multi-scale long-range features in residues while more comprehensively retaining hidden layer information.In particular,this study also proposes that the fusion of 3-state and 8-state secondary structure prediction features can further improve the prediction accuracy,which also provides a new idea for secondary structure prediction.Moreover,this study also proposes and compares multiple integrated deep models through BLSTM combined with TCN,reverse TCN(RTCN),multi-scale TCN(MSTCN),BTCN and MSBTCN respectively.Furthermore,this study discovered and demonstrated for the first time that reverse prediction of secondary structure was superior to forward prediction,suggesting that amino acids at later positions have a greater impact on secondary structure recognition.Experimental results on seven public benchmark datasets show that the proposed multiple ensemble deep models achieve better performance in 3-state and 8-state secondary structure prediction compared to five state-of-the-art methods. |