Font Size: a A A

Characteristics Analysis For DNA Sequences Of The Influenza Virus Based On The Time Series Theory Methods

Posted on:2012-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:J LiuFull Text:PDF
GTID:2154330338954723Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Influenza is a kind of recurring infectious diseases causing high morbidity and mortality in the world,there are three kinds of influenza virus:A,B and C.The type of A virus is the most virulent human pathogens among the three influenza types and causes the most severe diseases. Influenza virus broke out again in 2009, and human have experienced the outbreaks of influenza virus for several times in the 20th century.The understanding of the influenza virus is not comprehensive for us. We have to study the special properties of the influenza virus further. Influenza virus brought great threat to human health, so it′s an urgent task for us to study the DNA and protein sequences of influenza virus further. The analysis of the properties of the influenza virus is very significant to the prevention of influenza virus, the development of new vaccines, the design of drug molecular, and the control and treatment of the influenza virus.After introducing the background of Bioinformatics, we introduce the time series theory methods which are used to study the biological sequences.The methods can be used to analyze, forecast and control by processing dynamic data. We introduce the definitions, properties and methods of the ARIMA(p,d,q) model and ARFIMA(p,d,q) model which is for the research of DNA sequences and protein sequences of influenza virus.DNA sequences of influenza virus are converted into CGR radians series based on CGR coordinates, and a long–memory ARFIMA model is introduced to DNA sequence analysis. We select 10 H1N1 sequences and 10 H3N2 sequences randomly to analyze. We find they have a remarkably long-rang correlation and fit the model reasonably by ARFIMA model, and also find that we can use different ARFIMA models to identify the two kinds of sequences , i.e. ARFIMA(0,d,5) model and that ARFIMA(1,d,1) model can identify H1N1 and H3N2 respectively. Then, we analyze the DNA sequences of influenza virus B, C . And we also find a remarkably long-range correlation and fit the model reasonably by ARFIMA in these data from the randomly selecting 10 DNA sequences of influenza virus B and 10 DNA sequences of influenza virus C. We find that we can use different ARFIMA models to identify the two kinds of sequences ,too.As a classical time series model with perfect algorithm, ARFIMA model can help us find out the unknown properties of DNA sequences.We forecast the bases of DNA sequence of influenza virus A by ARIMA model. It is significant for the study on H1N1 virus. We choose 41 H1N1 virus of high homology , then simulate and forecast the top 20 position using ARIMA(p,d,q) model . The forecast figure displays the original data is in the forecast area. This indicates that the model we established is reasonable and has a better forecast. Then, we analyzed amino acid sequences of H1N1 virus by the same method, and also have a better forecast.
Keywords/Search Tags:influenza virus, DNA sequence, chaos game representation(CGR), time series model, ARIMA model, ARFIMA model, forecast
PDF Full Text Request
Related items