Font Size: a A A

Cancer Surveillance, Early Warning, Prevention And Control Management By Leveraging Internet Search Engine Data

Posted on:2022-01-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:C J XuFull Text:PDF
GTID:1524307304473704Subject:Health management
Abstract/Summary:PDF Full Text Request
ObjectivesSince the 20 th century,with the rapid development of industrialization and urbanization,as well as the aggravation of population aging,chronic noncommunicable diseases(such as cancers)have become an important public health problem threatening human health.In 2020,there were 19.29 million new cancer cases and 9.96 million deaths due to cancer worldwide,among which China leading the first in the number of new cancer cases and cancer-related deaths.Traditional populationbased cancer epidemiology registration processes are rigorous and complex,due to the strict collection,compilation,quality control and reporting process required for data,there is usually a lag of 3 years in the public reporting of relevant epidemiological data.With the advent of the era of big data,the Internet and information technology continue to develop and penatrate into all fields of production and life.The online world is getting closer to the real world,making it possible for cancer monitoring and early warning based on Internet search engine data.Based on the theory of Information Epidemiology theory,combined with classical epidemiological methods,this study developed the Internet search strategies of cancers,explored the spatial and temporal distribution characteristics of cancer-related search data,as well as the characteristics,search behaviors and search preferences of Chinese and Western Internet users,using Internet search engine data and cancer epidemiological data as sources.This study was aimed to provide theoretical basis and data support for the discovery of real-time mapping indicators of cancer trends in the real world,so as to formulate intervention measures and management policy recommendations.MethodsThe search data mainly came from Baidu Index(BI)and Google Trends(GT),and the disease-related data mainly came from the Global Burden of Disease database(GBD)and Center for Disease Control and Prevention(CDC).The first chapter was aimed to construct cancer search strategies in Internet search engines,mainly involving direct word selection,range word selection and technical word selection methods.Boolean operation method was used to determine the search paradigm of search terms.Multicollinearity test was performed for the preliminary selected search terms of various cancers to exclude the search terms that would cause collinearity and determine the final included search terms.The second chapter analyzed the correlation between Internet search engine data and real cancer incidence and mortality data,and the spatio-temporal characteristics of cancer prevalence.Spearman rank correlation was conducted to analyze the correlation between Internet search data and cancer incidence/mortality rates,and cointegration test was used to determine the stability of the correlation.On the basis of the above analysis,the third chapter firstly used the time series analysis to denoise the search data and extract the effective data.Finally,multiple linear regression model,least square method and time series method were used to construct the lung cancer monitoring and warning model to fit the incidence and mortality rates of lung cancer in the real world,and the models were verified.ResultsTwenty-eight site-specific cancers were selected,including lung,liver,stomach,esophageal,colon and rectum,pancreatic,breast,brain and nervous system,cervical,prostate,nasopharynx,bladder,gallbladder and biliary tract,lip and oral cavity,ovarian,larynx,kidney,uterine,thyroid,testicular cancers,leukemia,non-Hodgkin lymphoma,squamous-cell carcinoma,multiple myeloma,malignant skin melanoma,Hodgkin lymphoma,mesothelioma,and basal-cell carcinoma.In this study,the search data collection and preprocessing methods were developed,a correspongding search strategy was developed for each typer of cancer.Based on the search keywords,other search related words were expanded to form the final search paradigm.All the English search keywords for cancers were subject to the disease name in the GBD database.There was a correlation between the incidence of 26 types of cancer and Baidu Index(except nasopharynx and uterine cancers),all of which were statistically significant(P<0.01).There was no correlation between the mortality rates of stomach,esophageal and testicular cancers and the Baidu Index,but the correlation between the mortality rates of other cancers and the Baidu Index was statistically significant(P<0.01).The mortality of leukemia,uterine cancer,and Hodgkin’s lymphoma was negatively correlated with the Baidu Index.After logarithmic conversion of Baidu Index,incidence and mortality data,the three-time series data were stable after first-order difference,and there was a co-integration relationship between Baidu Index,incidence and mortality data at the level of firstorder difference,respectively.Similar results were found when analyzing Google Trend data and cancer data.Prediction models of incidence and mortality of lung cancer were constructed based on Google Trends data,and the goodness of fit reached to 0.97 and 0.95, respectively,indicating that the search data could be used to better predict the incidence and mortality of lung cancer.ConclusionWith the rapid development of Internet information technology,search engine data can map the actual prevalence trends of cancer in the real world.There is a correlation between search engine data and the incidence/mortality rate of most cancers.The results show that when the epidemiological data of cancers are usually delayed in release,the current epidemiological characteristics of cancers can be accurately evaluated and the effective trend prediction can be achieved by monitoring the realtime search data of Internet and combining with the existing data of cancer epidemiological data.The monitoring and early warning model constructed in this study provides the possibility for timely and full understanding of the burden of cancer,helps to improve the allocation of public health resources,and it provides a scientific basis for the formulation of cancer prevention and control policies.When using Internet search engine data for cancer sureveillance,early warning,and prevention and control management,we should fully rely on the current policies,integrate and use Internet multi-source big data,analyze cancer health-related information on the Internet,and promote the application of public health big data.According to the actual situation,the strategies should be formulated according to local conditions and different people,so as to achieve the goals of public health monitoring,health intervention implementation and effect evaluation,and intelligent medical strategy optimization.
Keywords/Search Tags:Cancers, Information epidemiology, Internet search engines, Baidu Index, Google Trends, Search behaviors, Monitor and early warning
PDF Full Text Request
Related items