Font Size: a A A

Analysis Of Influencing Factors And Risk Assessment Of Gastrointestinal Diseases Among Middle-aged And Elderly Chinese Based On Data Warehouse

Posted on:2022-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2504306488460414Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Chronic diseases have always been a serious problem in the healthy life of middleaged and elderly people,and gastrointestinal diseases have a direct impact on the daily life,diet,and other aspects of the patients.With the aggravation of China’s aging population,the burden of public health care is also increasing.Although there are many kinds of researches on the pathology of gastrointestinal diseases,there are little researches on the spatial distribution and influencing factors of gastrointestinal diseases.Therefore,this study on the spatial distribution,influencing factors,and risk assessment of gastrointestinal diseases in middle-aged and elderly people in China are beneficial supplements to existing studies and have important scientific significance and practical application value.This study considering the development of modern social data information,from the building begins with gastrointestinal diseases as the theme of the data warehouse,combining with geographic information systems and machine learning algorithms,refer to the principle of spatial epidemiology,presided over by the national development research institute of Beijing University of Chinese health and pension tracking survey data as the research object,using statistical analysis,spatial analysis,first to make a study of gastrointestinal diseases prevalence determine predisposition for population characteristics,geographical scope and sicken zones;Secondly,the hypothesis testing method in SPSS statistical software was used to determine the statistically significant risk factors.Finally,based on the relative factors of determining,through resampling method of the comparative analysis is used to build integrated learning classifier,choose the classifier with better effect to build integrated learning classifier,will eventually be integrated learning classifier classification results by GIS software map for risk,the results can determine the potential of high-risk areas and to evaluate the risk of various areas.The main contents of this study are as follows.(1)Data warehouse construction on the theme of gastrointestinal diseases in middle-aged and elderly people.Through the Hadoop cluster is set up and the data acquisition module based on Flume and Kafka framework is configured.In the data warehouse,to make the data flow more clear and orderly,this study designed a fourlayer logic model,each layer assumes different functions,and completely separates the original data from the final data extracted and used,which increases the robustness of the system and reduces the data coupling.(2)Visual statistical analysis of the disease situation.According to the established data warehouse,data tables of the data application layer are extracted,and a visualization system of B/S architecture is built to visualize the regional distribution,age distribution,gender distribution,and prevalence rate of the sick population in each region.The analysis and visualization results showed that the patients were mainly aged between 45 and 64,and the number of women with the disease was higher than that of men.The areas with a more and higher prevalence of the disease were all in the southwest of China.In other regions,such as Jiangxi Province,the eastern part of Inner Mongolia and Hebei Province,the prevalence rate is also high and concentrated.In other regions,such as the eastern coastal areas,only a few high prevalence areas appear sporadically.(3)Screening of relevant influencing factors and analysis of traditional Logistic regression model.By extracting data from the service data layer of the data warehouse,the data were imported into SPSS software for hypothesis testing and traditional Logistic regression modeling.The T-test is used for numerical data and the chi-square test is used for categorical data.According to the principle of the hypothesis test,the factors with a significant level(P<0.10)Logistic model was incorporated.In SPSS software,the influencing factors were further screened when the Logistic model was established,and 20 influencing factors with significant correlation were finally obtained.Among them,the physical condition has the most significant impact on the disease,showing that the worse the physical condition is,the higher the risk of disease.In addition,emotional factors such as sleep quality,feeling happy,and worrying about small things also have a significant influence on the disease.Specifically,positive emotional factors have a protective effect on the disease while negative emotional factors have a dangerous effect.Besides,the incidence of gastrointestinal diseases is also correlated with a variety of other chronic diseases,considering that there is a concurrent association among chronic diseases.(4)The risk assessment model of gastrointestinal diseases was established based on the influential factors after the screening,and the risk map was drawn.First,by three resampling methods to service data layer in a data warehouse data processing,after each sampling method,three models of Logistic regression,decision tree,and support vector machine(SVM)were established by Python for classification prediction,the results showed that in three different resampling method,decision tree compared to two other classifications has better performance,therefore in the integrated learning phase selection decision tree as a homomorphism integrated learning classifier.In ensemble learning,the random forest model,the ensemble voting classifier and the ensemble Stacking classifier are established to classify and predict the original data.The results show that the random forest model and the ensemble Stacking classifier have good performance,and the fitting accuracy is about 83%.Based on the results of this classification,the risk map of gastrointestinal diseases in China was drawn.The research results showed that southwest China is a high-risk area for gastrointestinal diseases,and other regions also have some high-risk areas with relatively scattered distribution and small affected areas.Through the establishment of the data warehouse,this study in China is the distribution of gastrointestinal diseases in the elderly and the ill influence factors are analyzed and establish the gastrointestinal disease risk assessment model to simulate the risk of various areas,the results can provide an effective scientific theoretical basis and information-based decision-making tools for public health departments to rationally allocate public health resources and formulate prevention strategies.
Keywords/Search Tags:Middle-aged and elderly, Gastrointestinal diseases, Data warehouse, Spatial autocorrelation, Spatial analysis, Ensemble learning, Risk simulation
PDF Full Text Request
Related items