Font Size: a A A

Data Analysis Of High Throughput Sequencing Of Emerging Infectious Viral Disease Pathogens

Posted on:2017-03-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:H FanFull Text:PDF
GTID:1224330488455758Subject:Military Preventive Medicine
Abstract/Summary:PDF Full Text Request
Many emerging infectious diseases have occurred in recent years. SARS coronavirus, MERS coronavirus, avian influenza virus, Ebola virus and other pathogens emerging infectious diseases are threatening human life. Emerging infectious pathogens are various and the classical methods for pathogen analysis cannot fully meet the need of research. High-throughput sequencing is a kind of developing technology, and the use of high-throughput sequencing technology to analyze emerging infectious agent is growing quickly. Bioinformatics methods combined with high throughput sequencing technology can effectively improve the ability of response to emerging infectious diseases. In this study, we selected the emerging infectious disease pathogens occurred in recent years at home and abroad as subjects to analyze the diversity and evolution of pathogens using high throughput sequencing and related analysis methods. The study can provide data for epidemic prevention and control.The main results of this paper are as follows:Sequencing analysis on the pathogens of adenovirus outbreaks in Chinese military. This study found adenovirus pathogens using high throughput sequencing in different fever outbreaks in Chinese military. The full-length genomes of pathogens were assembled by high throughput sequencing data. The study indicated that the adenovirus outbreaks in military were mainly caused by HAd V-7, HAd V-14 and HAd V-55. The characteristics of these viruses were analyzed by comparison genomics.Evolutionary dynamics analysis of 2014 Ebola virus outbreak in West Africa. We sequenced the samples from the patients of Ebola virus disease using high throughput sequencing method and 175 full-length sequences of Ebola virus were assembled. The sequences were compared with other sequences from public database, and 440 substitution sites were found in the newly sequenced data. Some of the sites have been fixed which showed the virus was still fitting to host. Seven different sublineages were found as of the time of research in Sierra Lenoe, indicating the virus was still under continuous diversity after spread to Sierra Leone. We found three important geographical sites for the epidemic prevention and control of Sierra Lenoe Western Area by the result of phylogeographic analysis. The substitution rates in outbreak calculated based on the latest sequences was consist with previous outbreaks, indicating that there is no sudden acceleration of mutation, which dispelled the doubt of the international community about the rapid mutation of the virus. This study revealed the genetic diversity and evolutionary dynamics of Sierra Leone in global view, which can provide references to the epidemic prevention and control work in Sierra Leone.Reassortment and coinfection of H7N9 avian influenza virus. Sequences were assembled from the high throughput sequencing result of eight poultry swab samples from Jiangsu Province, China. The H7N9 influenza viruses in the research had high similarity with H7N9 influenza viruses which infected human. The H9N2 influenza viruses in the research were consistent with local H9N2 influenza viruses. The coinfection of H7N9 and H9N2 viruses was found in four different samples, and the reassortment was found in one of the coinfection samples. The reassortment process was complex and dynamical based on these results. The HA and NA segments were transmitted from abroad and the six internal segments were introduced from local strains. The virus was generated from a reassortment event of three different virus types. The analysis results indicated the origin and transmission trend of H7N9 avian influenza virus, and the ongoing reassortment was observed.Diversity of anelloviruses in Chinese population. The nucleic acids in 33 blood samples were extracted and sequenced with high throughput sequencing combined with multiple displacement amplification(MDA) method. The infection rate of anelloviruses in Chinese population was very high, and the coinfection events were occurred in most samples. Different genera of anelloviruses were detected in some of the hosts. The viruses isolated from a single host were with a great diversity and were belonged to different sublineages. The results indicated the universality of anellovirus infections and unexpected complexity of evolution and transmission of the virus. There was no direct evidence that the anellovirus infection was related to human diseases, but it may have a potential correlation between infection and diseases. Plenty of sequences in the study will benefit the follow-up research.Discovery of a novel flavivirus in Yunnan Province, China. The full-length genome was assembled with high throughput sequencing from an isolated virus from Aedes in Dali, Yunnan Province. The virus was confirmed to be a new virus species of genus Flavivirus, and certificated by International Committee on Taxonomy of Viruses. The genome structure was similar to the aedes-specific flaviviruses previously reported, which contained an open reading frame coding polyprotein and an open reading frame resulted from frameshift. The virus cannot cause cytopathic effect to Vero cells like other insect-specific flavivirus, suggesting that it may not infect mammals.Efficient algorithm for fast pathogen screening from raw data of high throughput sequencing. It was not suitable for conventional analytical methods to screen pathogen in the samples which lacks pathogen information or the presence of pathogen cannot be confirmed. When the samples of suspected infection were sequenced directly using high throughput sequencing, the information of pathogen might be screened from large number of sequenced reads. An efficient algorithm was devised to improve the identification of pathogens. The method can screen pathogens from complex clinical samples and the results can guide the subsequent experiment to culture or isolate possible pathogens in samples. Dozens of cases have been processed with the algorithm and the pathogens were successfully found with the algorithm. Some results have been experimental verified.Development of related bioinformatics software. In order to reduce the intensity of human intervention, speed up the analysis and reduce the error rate of analysis, a series of programs were designed. The analysis pipeline was automated in the program. Related programs were applied in daily research work and played an important role in various research projects.In summary, this study established a series of bioinformatics analysis methods on emerging viruses, based on high throughput sequencing technology. Through all the above studies, this paper summarizes the data processing strategies for different pathogens with high throughput sequencing data. For known pathogens, we assembled the whole genome sequence with raw data. The genetic diversity and evolutionary dynamics of pathogens were analyzed using a combination of multiple sequence alignment and phylogenetic analysis; for the relatively new species of pathogens, we get the complete genome sequence by assembling, using sequence comparison and gene annotation to analyze the possible pathogenicity or other characteristics; for the samples of unknown pathogens, high throughput sequencing was applied to the sample and the possible pathogen information was discovered directly from the sequencing reads.
Keywords/Search Tags:emerging infectious diseases, high throughput sequencing, evolutionary analysis, metagenomics, pathogen screening
PDF Full Text Request
Related items