Font Size: a A A

A Deep Learning Model For The Prediction Of Bacterial Virulence Factors

Posted on:2021-05-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:D D ZhengFull Text:PDF
GTID:1484306308482454Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Bacterial infectious diseases pose a significant threat to public health worldwide.Despite the recent advances in the prevention,diagnosis and treatment of bacterial infection,deciphering the molecular basis of pathogenic bacteria remains as one of the interesting focuses of current microbiology.The pathogenicity of pathogenic bacteria depends on the virulence factors encoded in their genomes.Virulence factors(VFs)refer to the genetic elements that enable the microbes to establish infection and cause diseases in the hosts.As emerging bacterial infectious diseases are usually caused by variant clones that acquired additional VFs via horizontal gene transfer,a better understanding of bacterial VFs is also critical for more effective prevention and control of bacterial infectious diseases.With the recent development of next-generation sequencing technology and microbial genomics,a great number of bacterial genomes have been determined.How to identify and predict potential VFs efficiently and accurately from a great number of bacterial genomes becomes a challenging task of bioinformatics.Sequence similarity based alignment is the most popular approach for the detection of potential VFs from closely related sequences.Traditional machine learning based methods are also used to predict some categories of bacterial VFs.such as effectors of secretion systems.However,there is no sequence alignment independent method for the identification and prediction of various bacterial VFs available so far.Traditional machine learning methods heavily rely on prior knowledge to extract predefined features for initial model training,whereas the deep learning method,a new branch of machine learning,can learn expressive features from the raw data automatically.lndeed,deep learning methods have been successfully applied in many aspects of biomedical field in recent years.In this study,we firstly extracted the bacterial VF dataset from the virulence factors database(VFDB).which covers 24,739 bacterial VF-related genes from 32 genera of bacterial pathogens.In order to collect more training data.we further expanded the VFDB dataset with all complete genomes of the 32 bacterial genera available from NCBI to build a comprehensive dataset that consists of 160,495 sequences from 3,446 VF categories.Then,we constructed a convolutional neural network model named VFNet to successfully classify bacterial VFs,and verified the rationality of the model structure and the necessity of our data expansion.Finally,we compared our VFNet model with two newly published deep learning models and four traditional machine learning algorithms.On the dataset with sufficient samples(the number of samples in each class is more than 10),VFNet acquired the highest accuracy and the fastest model training speed as compared with the other two deep learning models.Further,by combining predefined features,VFNet achieved the highest accuracy of 0.9831 and F1-score of 0.9803 when compared with the traditional machine learning algorithms.On the dataset with insufficient samples(the number of samples in each class is no more than 10),VFNet also achieved the best classification performance by using the transfer learning technology,which achieved an accuracy 1%-13%and an F1-score 1%-16%enhancements over the best traditional machine learning algorithms.In addition,we proved that VFNet has the ability to recognize the conserved protein domains,which provided a certain degree of biological interpretation for the accurate classification of VFNet.We also explored the impact of high sequence similarity and genome origin of sequences on the classification performance of VFNet to further verify the good generalization ability of our model.In summary,we constructed the largest bacterial VF dataset including more than 160,000 sequences,which provides a good resource for future research on VF prediction.Furthermore,as the first attempt to apply deep learning algorithms to classify all categories of bacterial VFs,our convolutional neural network model(VFNet)showed significant advantages compared with other machine learning methods.Our results present here form a solid basis for further development of sequence alignment free applications for successful identification and prediction of various bacterial VFs.
Keywords/Search Tags:Bacterial Infectious Disease, Deep Learning, Convolutional Neural Network, Virulence Factors
PDF Full Text Request
Related items