| Leucine-rich repeats(LRRs)are short sequences characterized by a conserved pattern of hydrophobic leucine rich residues,which are present in most immune receptors,such as LRR-RLKs,LRR-RLPs and NBS-LRRs.Existing prediction tools are mainly based on Hidden Markov models,position-specific matrices and traditional machine learning algorithms,and we applied deep learning algorithms to develop a new tool for predicting LRR domains in unknown protein sequence.In the research process of this subject,we first constructed a positive and negative sample dataset of LRR units based on the re-constructed LRR unit highly conserved segment pattern.Subsequently,we developed DeepLRR by combining LRR domain sequence features and the convolutional neural network model,as well as the corresponding website service.In addition,we re-annotated the LRR-RLK and LRRRLP genes in tomato,Arabidopsis and rice reference genomes,and performed chromosome mapping,gene cluster analysis,tandem repeat analysis,and phylogenetic analysis of the LRR-RLK gene in the three plants.Finally,Weighted gene co-expression network analysis was performed using published LRR-RLK gene expression data in Arabidopsis,and some potential receptor and co-receptor genes were identified.The specific research results are as follows:1.We collected 1849 LRR protein sequences and 174800 non-LRR protein sequences in the Swiss-Prot database.After data preprocessing and reconstruction of the highly conserved segment pattern of LRR units,we constructed positive and negative sample datasets of LRR units.2.In order to compare the predictive ability of deep learning and traditional machine learning models,we trained and tested the CNN model,the SVM model,the RF model and the NB model to predict the LRR unit on the same dataset.To completely predict LRR domains in the unknown protein sequence,we developed DeepLRR by combing a CNN model and LRR domain sequence features,and achieved higher F1-score value than the other six prediction tools on the same dataset.3.In order to improve the practicability of DeepLRR in the scientific research,we have developed the DeepLRR website service and equipped the identification pipeline of plant disease-resistance proteins(LRR-RLK,LRR-RLP and NBS-LRR)and their non-canonical domains.4.We re-annotated the LRR-RLK and LRR-RLP genes in tomato,Arabidopsis and rice reference genomes based on DeepLRR,and performed chromosome mapping,gene cluster analysis,tandem repeat analysis and phylogenetic analysis of the LRRRLK gene.5.Based on the above work,we collected the expression data of Arabidopsis LRRRLK gene in 48 samples,combined with LRR unit features of known receptor and co-receptor pair and WGCNA results to screen some possible potential receptor and co-receptor genes. |