Font Size: a A A

Mapping The Distribution And Predition Of Middle East Respiratory Syndrome And Pathogen's Phylogeographic Study

Posted on:2022-07-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:A R ZhangFull Text:PDF
GTID:1484306311976759Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
BackgroundMiddle East Respiratory Syndrome(MERS)is a respiratory infectious disease first discovered in the Kingdom of Saudi Arabia in September 2012.The disease is caused by the Middle East respiratory syndrome coronavirus(MERS-CoV)which can be highly pathogenic in humans.MERS has a high case fatality rate.Individuals infected with MERS-CoV may experience none,mild or severe respiratory illnesses or even death,which was difficult to distinguish from other similar respiratory diseases.It has gradually spread from the main epidemic in the Middle East to 27 countries on four continents.The vast majority of MERS cases were reported by the Saudi Arabia,followed by South Korea.Frequent travelers and worshippers from and to the Middle East have raised the concern about a global pandemic,given the lack of effective treatment and prevention strategies.MERS has attracted great attention from countries around the world and become a research hot spot in the field of global public health.In February 2018,WHO formally incorporated MERS into the Research and Development Blueprint to promote research in this area.It has been eight years since the emergence of MERS.Many progress has been made by researches carried out in epidemiology,etiology,diagnosis and control,etc.,but there are still some shortcomings.Current researches are mostly separate studies of the epidemiological characteristics of infectious diseases or the evolutionary dynamics of etiology and evolution.Study design of most research failed to comprehensively consider the heterogeneity of the temporal-spatial distribution and lacked effective use of various kinds of factors,such as variables of social economy,meteorology and geography.Very few studies have systematically analyzed spatial diffusion of MERS and associated risk factors.A rigorous assessment of the ecology of MERS-CoV at the global scale is also urgently needed to improve surveillance in high-risk areas in preparation for a potential pandemic.By integrating spatial information technology,machine learning model,phylogeography technology and using the most updated data,this research comprehensively considers the effects of biological,environmental,and sociological factors,and systematically explores the dynamics and risk assessment of MERS on the individual,population and molecular levels.Our study will help improve the current understanding of the epidemiological characteristics of MERS,identify high-risk areas and vulnerable populations,appropriate control measures and prevent strategy for different regions and contribute to the formulation of long-term intervention strategies.All of these will be of great significance in public health.Objectives1.To describe the epidemiological distribution characteristics and epidemiological regions of global MERS,focusing on comparing the differences between the characteristics of animal-induced infections and human-to-human infections.2.To investigate risk factors for fatality and how their effects could be modified by each other,and assess the roles of a variety of environmental,socioeconomic,and biological factors in the spatial diffusion of MERS-CoV.3.To evaluate the contributions of a variety of environmental,socioeconomic,and biological factors to the ecological suitability of MERS-CoV upon several popular machine learning algorithms and their ensemble,and make a predicted risk map of study area.4.To complete the evolutionary family,spatiotemporal migration and positive selection site analysis of MERS-CoV,and incorporate family information into the individual and group level to further explore the influence of family evolution on the epidemic characteristics of MERS.Methods1.Data collection:Data on confirmed MERS cases before June 1,2020,were collected from official reports of WHO,which were cross-validated with and supplemented by data from the Food and Agriculture Organization(FAO)of the United Nations,the health departments of affected countries and literatures.All of above were used to form the MERS individual case database.Records of MERS-CoV detection in animals during the same period were obtained from FAO and literature review.The following socioenvironmental variables potentially related to the transmission of MERS or contributing to the ecology of MERS-CoV were collected from the official website or database of the corresponding field:population density,camel density,monthly meteorological data,elevation,land cover,distribution of transportation and hospitals.All these data followed the data quality control requirements to complete the cleaning,classification and extraction of the original data,thus forming a regional multi-element database.Full-genome sequences(>30,000bp)of MERS-CoV released before June 1,2020,as well as corresponding protein and coding sequences,were retrieved from GenBank.After verification of sampling area and sampling time,the MERS-CoV molecular sequence database was formed.2.Statistical analysis:(1)Spatiotemporal distribution and epidemiological characteristics of global MERS cases were described,and the differences in demographic and clinical features of cases between different transmission methods(animal-induced transmission and human-to-human transmission)were clarified.(2)Logistic regression was performed to explore risk factors associated with the survival outcome of MERS cases.Two-way interactions among age group,sex and animal contact history were considered.(3)A Cox proportional hazard model was used to assess which socioenvironmental variables were associated with diffusion of the MERS case in the Middle East.A spatial trend contour plot was developed to visualize the spatial diffusion of the disease based on the trend surface analysis module in ArcGIS.(4)Combining demographic information,biological environment and socio-economic factors,machine learning methods were used to assess the ecological niches of MERS-CoV within the research scope.First,according to the traditional strategy,three commonly used basic models were used,including the boosted regression tree model,the random forest model and the support vector machine model.The optimal model was selected through model evaluation.Secondly,an ensemble model by stacking the three base machine learning models was built to obtain a better generalization performance.An updated picture about suitable niches of the MERS-CoV was provided based on the results of the ensemble model.Important contributors were extracted based on the optimal basic model,and the logistic regression model was further used to obtain the exact effect values of important contributors.(5)The whole-genome sequences were analyzed using toolkits provided by the Nextstrain framework.A phylogenetic tree was built using a maximum likelihood approach to study the genetic diversity and evolution process of MERS-CoV.The phylogeographic analysis was used to infer the divergence time,discrete traits of the ancestral nodes(location and host),and geographic transmission history across the tree.To detect sites under positive selection among the coding sequences of protein genes,the PAML was used to detected positive selection sites using the branch-site model.Using statistical matching sampling methods,family information at molecular level were correlated with disease status at population and individual level,and whether the pathogenicity of MERS-CoV changes with genetic evolution was explored.3.Software:ArcGIS 10.5?R 3.6.2?Python 2.7?Nextstrain?MAFFT v7.407?IQ-Tree v1.6.10?PAL2NAL v14?PAML v4.9?CorelDRAWX8 were used to conduct data analysis.Results1.Based on the individual case database with a total of 2450 laboratory-confirmed MERS cases,150 case clusters were derived.Death occurred in 802 patients,leading to a case fatality rate(CRF)of 32.73%.The median age was 53(IQR:38-65)years old,and 69.35%of cases were male.Health-care workers accounted for 13.67%of the total patients.Among the 1453 patients with known exposure history,356(24.04%)reported animal contact.For spatial distribution,MERS cases tended to be distributed in tropical regions and the northern hemisphere.The region with the highest number of cases was the Middle East,in which the Kingdom of Saudi Arabia was the highest country,followed by South Korea.Zoonotic infections only occurred in the Middle East,although cases with animal contact had also been imported into Europe and Southeast Asia.For temporal distribution,there were 3 peaks during 2014 to 2015,which mainly originated from outbreaks in Saudi Arabia and South Korea.The seasonal distribution had a peak in spring(April to June).Cases with animal contact had significantly higher CFR,and were more likely to be older,male,and more likely to have underlying conditions and a longer delay from disease onset to diagnosis,in comparison to cases without.The two modes also differed in seasonality.Animal-induced transmission events occurred mainly during January to March,and human-to-human transmission peaked subsequently from April to June.2.According to determinants for case fatality,elderly(?65 years old),men,Middle Eastern residents,cases with underlying diseases and with animal contact before onset were all at higher risk of death.We identified the significant negative two-way interactions among age group,sex and animal contact.In the cases without animal contact history or female cases,the effect of age was more obvious.Analysis also revealed that the effect of animal contact history varied by different age group and sex.Animal contact history was a risk factor for death in younger(<65)female patients.3.In the analysis of MERS spatiotemporal spread,it was found that the disease spread rapidly from the central Arabian Peninsula to the surrounding areas during April 2014 to the end of 2015,with spreading more rapidly towards the east,the direction of the United Arab Emirates and Oman.We found the road and railway traffic network played an important role in the early rapid regional dispersion of MERS.In addition,altitude,percentage of bareland coverage and the number of hospitals were also risk factors for diffusion of MERS.4.All basic and the ensemble machine learning models showed satisfactory performance for predicting presence of MERS-CoV,with the lowest AUC of 84.14%on the test data.The ensemble model attained the highest prediction efficiency with a mean AUC of 91.66%on the meta test data.We mapped the average predicted probabilities of MERS-CoV presence given by the ensemble model.High and moderate risk areas spanned over the Middle East,West Asia,the whole North Africa and a small part of East Africa,much wider than the geographic range with human MERS cases or positive animal samples recorded.Southern Europe,eastern Africa and southern Africa were predicted to have mild risks,consistent with the observation of only a few human cases or positive animal samples in these regions.In sensitivity analysis,the risk map obtained from the maximum entropy model,another popular method for presence-only data,also showed a comparable distribution.Based on the BRT model,we found that bareland coverage was the leading contributor to the risk of MERS-CoV presence with a relative contribution(RC)of 30.06%,followed by coverage of forest land with a RC of 10.74%.Population density,annual mean temperature,coverage of cropland,and camel density had moderate RCs,ranging from 6.20 to 7.28%.According to risk curves,the higher risk of MERS-CoV presence was associated with higher levels of bareland coverage,population density,annual mean temperature and camel density,but was associated with lower levels of forest coverage and cropland coverage.Multivariable logistic regression analysis based on the above important contribution factors showed that the results were basically consistent with the above model.The higher level of wasteland coverage accounted for the highest effect value(OR=23.74)associated with presence of MERS-CoV,followed by the higher level of annual average temperature(OR=4.05)and camel density(OR=1.80).Forest coverage and cropland coverage were excluded from the multivariable logistic analysis for their high correlations with bareland coverage(R>0.6).Furthermore,no significant two-way interaction was found among bareland coverage,population density,annual mean temperature and camel density.5.An initial analysis showed that sequences from bat and hedgehog formed a separate clade distant from the main clade of sequences from humans and camels(include lama glama),confirming that camel is the zoonotic reservoir of MERS-CoV for spillover to human.Sequences from human and camel mixed throughout the whole tree,indicating multiple introduction events from camel to human.The root ancestor of clade C(C1-C5),dated back to January 2007,was 49.3%likely from camel and 50.7%likely from human.The phylogeographic analysis showed that the spatiotemporal transmission pattern of clade C was characterized by intense local migration within the Middle East and occasional long distance exportation.The top three most likely locations of the inferred root ancestor were Riyadh of Saudi Arabia,the Nile Delta region and Jordan with posterior probabilities of 31%,17%and 12%,respectively.Riyadh appeared to be the major source exporting infections both locally and internationally.It was estimated with 99%posterior probability as the location of common ancestral node for subclades C3,C4,and C5 which cover 97.5%of the collected sequences.Intense migration of the virus from Riyadh towards local cities in Saudi Arabia,Abu Dhabi in United Arab Emirates and Europe started during 2011-2012.Abu Dhabi soon joined Riyadh as the second hub exporting the virus to other Middle East cities as well as to Europe.The opportunistic exportation events from the Middle East to the United States in 2014 and to East Asia in 2015 were correctly captured by the model.For positive selection,we identified eight amino acid positions in the spike glycoprotein potentially associated with positive selection,three of which are novel sites.After correlating the smoothed regional morbidity and other indicators with the human sequence,it is found that the case fatality rates differed between clades in the phylogenetic tree,and C5 was associated with a higher case fatality rate than other clades.But when the clades information was matched to the individual level,the effect of clades on death was not statistically significant.Conclusions1.In recent years,the proportion of MERS cases with animal contact,which had higher case fatality rate than others,has shown a clear upward trend.So more attention should be paid to the monitoring,prevention,control and treatment of MERS in animals,as well as cases from zoonotic infection.2.MERS has a high case fatality rate.Elderly(?65),male,Middle Eastern residents and cases with underlying diseases were all at higher risk of death.The effect of animal contact history varied by different age group and sex.High-risk populations should be provided with strengthen monitoring and education,and should be prepared for intensive treatment as soon as they are diagnosed for potential severe outcomes.3.Transportation network was the leading driver for the spatial diffusion of the disease.Active surveillance and screening of infected travelers at transportation hubs such as international airports,are still urgently needed,especially areas with convenient transportation or frequent communications with the epidemic area.4.Based on the model-predicted risk map,we found ecologically suitable areas span over the Middle East,West Asia,the whole North Africa and a small part of East Africa,much wider than the region with reported human MERS cases.Current findings provide valuable understanding of the ecology of MERS-CoV and can inform public health agencies to target surveillance and interventions at high-risk regions in order to control zoonotic infections more efficiently and to prevent a potential pandemic.5.Camel was the zoonotic reservoir of MERS-CoV for spillover to human with multiple introduction events from camel to human.Detection reagents should enhance the recognition of the currently dominant sub-clade C5.The novel sites found in this study could give directions for further study on potential targets of antivirals and vaccines against MERS-CoV.Innovations1.A cross-continental fine-scale risk assessment study of ecological niches of the MERS was carried out based on multi-elements.In our study,we not only constructed a model to predict the spatial distribution of the animal host(camel),but also included both the distribution of human cases with local animal contact and MERS-CoV positive animal specimens in our model to provided a reliable risk prediction map for ecological niches of MERS-CoV.In order to obtain the best prediction efficiency,we innovatively adopted the stacking method to integrate multiple basic machine learning models,which greatly improves the precision and accuracy of the prediction.Risk assessment study with the above-mentioned design has not been reported in the current publicly reported literatures.2.Study of determinants for spatial diffusion pattern and associated socioenvironmental drivers of MERS were carried out.In our study,spatial trend surface analysis upon reported cases and phylogeographic analysis on sequences,which verified each other from the two aspects,jointly revealed the spatial diffusion pattern of MERS,and quantitatively estimated the effects of risk factors such as traffic and hospital distribution.The results have important scientific significance for clarifying the main monitoring direction and guiding the prevention and control of MERS.3.A in-depth analysis was provided on multiple-levels analyses with comprehensive research data.The research combined case data,pathogenic data and related biological,environmental,and sociological information to form three updated MERS-related databases at the individual,group and molecular level.Our results were pieced together to form a complete picture about where this pathogen originated and how it spreads,and thereby improved the current understanding of the epidemiological characteristics of MERS,identified vulnerable groups and high-risk areas,and provided references for the formulation of local prevention and control measures in different regions.
Keywords/Search Tags:Middle East Respiratory Syndrome, Risk factors, Machine learning, Risk assessment, Phylogeography
PDF Full Text Request
Related items