Font Size: a A A

Predicting The Antigenic Variant Of Influenza Virus And Virus Host Based On Deep Learning Methods

Posted on:2019-09-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y TanFull Text:PDF
GTID:2428330545950673Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of DNA sequencing technology,a large number of genomic and protein sequences have emerged.Many researchers began to mining these massive sequences,resulting in rapid development of bioinformatics.More and more statistical and machine-learning methods have been applied in bioinformatics.Because of the exponential growth of computing speed in recent years,deep learning has set off a new wave of machine learning.Deep learning is a recently developed machine learning method.It has been widely used in the fields of image processing and speech recognition,and has achieved good results.In bioinformatics,deep learning has also been applied to some extent.In this paper,we have studied the prediction of influenza virus antigen variation and virus host based on the deep learning method.The main work is as follows:Firstly,prediction of influenza virus antigen variation.Due to the rapid mutation of the influenza virus and frequent change of antigenicity,immune escape and vaccine failure happened frequently.The rapid determination of the antigenicity of influenza virus is helpful for identifying antigenic variant in time.Based on the hemagglutinin(HA)protein sequence,a sparse automatic encoders(SAE)model was established to predict the antigen variation of human influenza(H3N2)virus.The accuracy of the model in the five-fold cross-validation reached 95%,which was better than those of logistic regression model,decision tree model and SVM.Further analysis of the model showed that most nodes in the hidden layer,which contributed most to the antigen variation,were jointly determined by multiple residues.In addition,some of the characteristics of the input layer(HA protein residues)were observed to participate in a number of significant hidden layer nodes,such as residues 189,145 and 156,which were also reported to be the main determinants of antigenic variation in influenza A(H3N2)virus.Secondly,prediction of virus host.Viruses play a very important role in the ecological balance of the earth,the evolution of species,and the health of human beings.Because of the diversity of viruses,understanding of viruses is far from enough.Most of the viruses that infect human beings are only beginning to be valued after causing serious life damage to humans.Rapid determination of the host of unknown viruses is helpful to understand the virus and better prevent the potential threat of the virus.In this paper,a deep learning model,termed HDeep,was proposed based on the genomic sequence of the virus to predict the host(one of Archaea,bacteria,fungi,plants and animals)for the virus.It showed better performance than the models of random forests and K-near neighbours.In addition,for the virus that infected archaea or bacteria,the method based on CIRSPR spacer and tRNA were used to further determine their specific host;for the virus that infected animals,and for viruses that infected vertebrate,HDeep models were used to predict whether the virus infected vertebrate or invertebrate,and to predict whether the virus infected human or non-human vertebrate animals,respectively.The study on the prediction of influenza virus antigen variation not only helps to quickly identify the influenza antigen variant virus,but also can understand the molecular mechanism behind the antigen variation through the analysis of the SAE model,and the research on the prediction of the virus host is beneficial to the rapid determination of the host of the unknown virus and provide scientific guidance for the prevention and control of the virus.Therefore,the completion of this work is not only the exploration of the application of deep learning in biological problems,but also the results are helpful for the prevention and control of the virus,and have certain practical application value.
Keywords/Search Tags:bioinformatics, deep learning, influenza, antigen variation, virus host
PDF Full Text Request
Related items