| Background Acute respiratory infectious diseases can cause global pandemics,mainly including Corona Virus Disease 2019(COVID-19)and seasonal influenza.At the end of 2019,the COVID-19 epidemic broke out suddenly and the population generally lacked immunity.It is an urgent scientific and public health problem to clarify the infection situation of the population,especially the key population.At the same time,the epidemic has exposed the inadequacy of traditional surveillance of respiratory infectious diseases in China,and it is not clear whether the future epidemic pattern of COVID-19 will become normal,seasonal or cyclical like influenza.In addition,the prevention and control of COVID-19 in the past has also affected the level of influenza activity,and it is necessary to make more accurate predictions of the future epidemic trend of influenza.Influenza surveillance system has been established and operated well in China,and the traditional prediction technology of respiratory infectious diseases can be improved by using long-term and extensive influenza surveillance data.Therefore,this study takes COVID-19 and influenza as examples to analyze the infection status of COVID-19 and establish prediction models based on symptom surveillance to provide support for the prevention and control of acute respiratory infectious diseases in the future.Methods In this study,Wuhan was selected as the study area of COVID-19 infection,and the data were obtained from a prospective long-term follow-up study.The subjects were recruited by questionnaire in Wuhan,and the baseline survey and blood samples were collected in April 2020.Two follow-up surveys were conducted in June 2020 and October 2020.In this study,the basic information and blood samples of people aged 40 years and above were collected,and they were divided into underlying disease group and non-underlying disease group.Descriptive analysis was used to describe the distribution of patients’ age,gender,underlying diseases and other factors.Logistic regression was used to analyze the factors affecting the infection of COVID-19,and laboratory testing was used to detect blood samples.In this study.Beijing was selected as the research area of influenza prediction modeling,and the influenza sentinel surveillance data from the 26th week of 2010 to the 25th week of 2019 were collected.The SARIMA model and Hybrid LSTM deep learning prediction model were constructed by using ILI%and ILI%*positive%.R2,AIC value and explained variance score(EX)were used to compare the effects of different prediction points and different prediction periods.Based on the availability of multi-source heterogeneous data,this study collected multi-source heterogeneous data such as influenza,Baidu search engine index,meteorological and social factors from the 26th week of 2012 to the 25th week of 2019,and arranged and combined the data.In this study,we established a Multi-Attention-LSTM(MAL)model that can integrate multi-source heterogeneous data,and constructed models of ILI%combined with other multi-source heterogeneous data and models of ILI%*positive%combined with other multi-source heterogenous data,respectively,to explore the best combination of influenza predictions.The MAL model was compared with the commonly used prediction models such as random forest,XGBoost,LSTM and GRU.R2,EX,mean absolute error(MAE)and mean square error(MSE)were used as evaluation indexes to test the prediction performance of MAL model.Results After the first wave of COVID-19 in Wuhan,the seropositivity rate of anti-SARS-CoV-2 antibody in people aged 40 years and above was 6.18%.The seropositive rate of anti-SARS-CoV-2 antibody in the population with underlying diseases was 6.30%and was 6.12%in the population without underlying diseases.Retirees,self-reported symptoms,hospital visits for fever or respiratory symptoms since December 2019,and contact with any person with fever or respiratory symptoms since December 2019 were risk factors for COVID-19 infection(OR>1,P<0.05).From April to December 2020,IgG titers of COVID-19 patients in the potential underlying disease population decreased significantly over time,while neutralizing antibody titers remained stable.The positive rate and mean IgG titer of neutralizing antibody in asymptomatic patients were lower than those in symptomatic patients,regardless of underlying diseases.A prediction model was established for ILI%and ILI%*positive%data from the 26th week of 2010 to the 25th week of 2019 in Beijing.SARIMA model was the best when the rising point of the trend before entering the peak of influenza was taken as the prediction point,but the prediction effect of ILI%*positive%was better than that of ILI%(ILI%:R2=0.59,AIC=-3495.07;ILI%*positive%:R2=0.76,AIC=-3965.07).When the prediction period was extended from 1 week to 26 weeks,the R2 and EX values of ILI%and ILI%*positive%predicted by the hybrid LSTM model were always above 0.75,showing a good and robust prediction effect.For the SARIMA model,only ILI%*positive%modeling had a good effect,and the R2 value could remain above 0.75 until the 26th week.The MAL model can standardize and predict modeling of multi-source heterogeneous data.The comparison results of models with different permutations and combinations of multi-source heterogeneous data show that the combination of ILI%+meteorology+social factors+"Influenza" Baidu search index had the best prediction effect,with EX of 0.78,R2 of 0.76,MAE of 0.08,and MSE of 0.01.The combination of ILI%*positive%+meteorology+social factors+"Influenza" Baidu search index had the best prediction effect,with EX of 0.74,R2 of 0.70,MAE of 0.02 and MSE of 0.02.The comparison results of MAL model with random forest,XGBoost,LSTM and GRU models showed good prediction performance of MAL(both R2 and EX exceeded 0.70,and both MAE and MSE were around 0.02).Conclusion In this study,we analyzed the COVID-19 infection among people aged 40 years and above from April to December 2020 after the first wave of COVID-19 original strain infection in Wuhan.The seropositive rate was 6.18%,and the seropositive rate of people with underlying diseases was 6.30%.This study explored the factors that may affect COVID-19 infection in people aged 40 years and older and the risk factors for COVID-19 infection in people with underlying diseases in this population.The results of long-term follow-up showed that the titer of IgG decreased significantly with the passage of time regardless of underlying diseases,and the titer of neutralizing antibody could remain stable within 9 months.The positive rate and average IgG titer of neutralizing antibodies in asymptomatic infected individuals were lower than those in symptomatic infected individuals.This study revealed the changes of serum antibody levels in people aged 40 years and above,including underlying disease population and non-underlying disease population,which can accurately grasp the infection situation of such patients and provide data background for the prevention,control and prediction of COVID-19 and other acute respiratory infectious diseases in the future.This study also used ten years of seasonal influenza data to establish prediction models,through the comparison of different prediction points and different prediction cycles,it was clarified in detail that the prediction modeling effect was the best when entering the rising trend point before the peak of influenza.The hybrid LSTM model was more suitable for long-term prediction,and the SARIMA model was more suitable for short-term prediction.Different modeling methods can be selected according to different data collection extension and prediction cycles.In this study,the MAL deep learning prediction model as established to standardize and predict the multi-source heterogeneous data such as influenza,meteorology,Baidu search index and social factors,and the results showed that the model had good prediction performance.The prediction model established in this study and the comparison results between the models can provide scientific basis and method reference for exploring the development trend of acute respiratory infectious diseases and establishing multi-channel and intelligent infectious disease prediction and early warning technology. |