| Transmembrane proteins usually form stable oligomers to activate specific biological functions.For example,part ligands only combine with transmembrane protein to form dimers to regulate the specific signals transport in the lipid bilayer.The malfunction of these receptors caused by oligomer conformational changes results in part diseases,such as cancer,diabetes and cystic fibrosis.Oligomer and its key residue prediction algorithm based on transmembrane protein sequence is helpful to understand the biological function of transmembrane proteins and the development of drug design.Previous oligomer prediction models only extract discrete frequency characteristics of protein sequences and ignore the influence of sequence context on the oligomerization of transmembrane proteins.Moreover,these motif discovery models rely on multiple homologous sequences to search for highly conservative subsequence patterns.Significantly,these motif discovery models can’t recognize actual functional motifs from these overexpression subsequence patterns.With the attention mechanism was widely used in deep learning,a proposed oligomer motif prediction approach is to use the attention mechanism to locate high weight subsequence as potential oligomer motif in the oligomer prediction of transmembrane protein.Therefore,an attention-based and Global-Local structure location model is proposed to recognize transmembrane domain and oligomer motif.The experimental results show our location model successful located multi-transmembrane domains and most known dimer motif.The main content of this paper is summarized in two aspects as follows.(1)An attention-based Global-Local structure model named GLMFD is proposed to locate transmembrane domain.The transmembrane domain as a kind of subsequence pattern was composed of specific amino acid residue distribution in protein sequence.The design of Global-Local structure was convenient to extract the local subsequence pattern features and global context information from transmembrane protein sequences.Significantly,force-to-one and force-to-zero penalizations were proposed to encourage attention mechanism to focus multiple transmembrane domains.The experimental results show that GLMFD model can achieve 63.6% accuracy of transmembrane protein prediction,57.2% accuracy of transmembrane domain prediction and 6.471 positional deviations.Moreover,this paper proved the necessity of penalization,the importance of sequence context information and the influence of local subsequences selection by three contrast experiments.(2)An attention-based Global-Local structure Bi-LSTM model named GLTM model is proposed to predict transmembrane protein oligomers and screen its oligomer motifs.Compared with GLMFD model,GLTM model has the following three improvements: First,consider of the amino acid residues in transmembrane domain have a strong coupling effect,GLTM model changed the local layer network to enhance the context information extraction capability;Second,GLTM model combining random window selection method with improved one-hot encoding proposed a novel data enhancement method;Third,in order to avoid the error location of attention mechanism,a new position penalization was proposed to encourage GLTM model to focus on known oligomer motifs.The experimental results showed that the GLTM model successfully located most of known oligomer motifs based on97.37% prediction accuracy.The visualization results show that the position penalization is helpful in specified subsequence pattern location. |