Font Size: a A A

Research On Classification Method Of Tobacco Mosaic Virus And Plant Resistance Protein Based On Machine Learning

Posted on:2023-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y M ChenFull Text:PDF
GTID:2543306842980299Subject:Forestry Engineering
Abstract/Summary:PDF Full Text Request
Proteins are the products of gene expression and are the basic carriers for performing life f unctions,making proteomics an important area of research in the post-genomic era of life scien ces.In general,compared to animal genomes,plant genomes are larger,have more repetitive se quences and are more heterozygous,making their protein structure-function diversity more co mplex.Accurate prediction of plant proteins and precise classification of plant proteins are prer equisites for a deeper understanding of the material basis of plant life activities at the molecular level.With the development of plant genomes,plant protein data has grown significantly,and t raditional biological experimental methods to determine plant protein classes are time consumi ng and expensive.Therefore,it is necessary to use machine learning algorithms to identify plan t proteins.This thesis focuses on two key issues related to plant proteins based on machine lear ning algorithms: the study of classification methods for tobacco mosaic virus proteins and the s tudy of classification methods for plant resistance proteins.The details of this paper are as follo ws.Plant diseases can be caused by a combination of pathogens and their eradication requires the identification of the type of pathogen infecting them.Tobacco mosaic virus is a common pl ant virus that can infect more than 350 plant species.Common methods for identifying tobacco mosaic leaves are time-consuming and laborious,so it is important to use machine learning me thods to classify tobacco mosaic virus proteins.In this paper,tobacco mosaic virus proteins we re collected from the UniProt database,and a benchmark dataset was constructed by redundanc y removal and other operations.Using a feature extraction method based on tobacco mosaic vir us protein sequence information,Amino Acid Composition and Composition of Physical and C hemical Properties,and Composition of k-spaced Amino Acid Paris,and combining Amino Aci d Composition and Composition of Physical and Chemical Properties feature extraction metho ds,and Composition of k-spaced Amino Acid Paris combined with Composition of Physical an d Chemical Properties feature extraction methods.The results of the independent test set showe d that the support vector machine algorithm combined with the Composition of Physical and C hemical Properties feature extraction method method gave the best results in terms of accuracy and robustness in the classification of tobacco mosaic virus proteins using this model.This also provides a clue for a deeper analysis of tobacco mosaic virus proteins in the future.Plant resistance proteins have evolved in response to complex environmental changes and pest and disease infestation during plant development.Work on the classification of plant prote ins facilitates further exploration of plant disease resistance mechanisms.In this paper,resistan ce protein data from a variety of plants were collected and negative samples were de-redundant ly processed to construct a benchmark dataset.Proteins are feature extracted based on different protein coding methods so that the model can extract effective features,using methods based o n amino acid sequence information,based on Amino Acid Pairs,based on Grouped Amino Aci d Pairs Composition,based on Amino Acid Property Information Composition,and using mach ine learning methods as well as convolutional neural network methods to classify plant resistan ce proteins,and based on the training results,further top performing protein The feature extract ion methods were combined.The experimental results and the evaluation metrics show that the support vector machine algorithm combined with CTDT CTriad has a high accuracy and robus tness of the model.
Keywords/Search Tags:Protein classification, Machine learning, Tobacco mosaic virus proteins, Plant resistance proteins
PDF Full Text Request
Related items