| Protein secondary structure is an important aspect of proteome projects,it can not only reflect protein local structure,but also guide and confirm protein functionalities,enabling for future artificial production and customization of new protein therapeutic compounds.The current research on the task of protein secondary structure prediction has advanced to a high level,but because long-range interactions between amino acid residues are difficult to capture,the extracted feature information is limited,and the model’s performance for protein sequences of varying lengths is unstable,resulting in poor prediction accuracy.To address the aforementioned issues in protein secondary structure prediction,this paper explores and researches protein secondary structure prediction methods based on machine learning and deep learning technology,and constructs a series of network models to effectively predict protein secondary structure,as well as achieves certain results and advancements,the primary components of which are as follows:(1)Protein secondary structure prediction network model based on Convolutional Neural Network(CNN)and Generative Adversarial Network(GAN).The model input the protein sequence as a digital vector into the network,learnt and extracted features from a large number of amino acid samples through a CNN containing 10 layers of convolution,and predicted the secondary structure of the protein based on the information of amino acid positions and adjacent amino acids.In order to solve the limitations of CNN and the singleness of input features,a protein secondary structure prediction network model fusing GAN and Position-Specific Scoring Matrix(PSSM)is proposed.Among them,PSSM describes the preference of amino acids at this position by independently scoring 20 positions,which contains rich evolutionary information and can effectively increase feature information.At the same time,GAN is used to extract features,and the extracted features are combined with PSSM of protein sequences Combined with the input of the network,then through the CNN iteration,better accuracy and generalization are obtained.(2)Protein secondary structure prediction network model integrated with Self-Attention mechanism.By using Self-Attention Enhanced Convolution(SAEC)Network to predict protein secondary structure,it is not only possible to identify the most important part of a given protein sequence,but also to expand the receptive field of the network without adding additional computational costs.In addition,Self-Attention is introduced into the generator and discriminator of GAN to correlate the local details and remote part information of the amino acid sequence,and fully mine and apply the global information in the amino acid sequence to implement more complex geometry constraint.The network model integrating GAN and Self-Attention(SA-GAN)not only guarantees the feature extraction ability of the discriminator,but also makes the network in the long-range dependent range,statistical quality and computational The cost and other aspects show a good balance,and the Two Timescale Update Rule(TTUR)and Spectral Normalization(SN)are used in the network to stabilize the training of GAN,which effectively improves the prediction accuracy of the protein secondary structure and the robustness of model.(3)Protein secondary structure prediction model(GNN-SVM)based on Graph Neural Network(GNN)and Support Vector Machine(SVM).The model can be divided into two main parts,in which GNN encodes the amino acid sequence into a node vector and converts it into a graph structure,and uses the characteristics of nodes to establish the relationship between amino acid residues and residues in the sequence,so as to better reflect the structural information of proteins.Other hand,SVM classifies and predicts the secondary structure of proteins by learning the features extracted from GNN and continuously optimizing parameters to obtain the optimal hyperplane.Through the comparison and verification of various experiments,it is shown that the GNN-SVM model performs well in the task of protein secondary structure prediction and has broad application prospects. |