Font Size: a A A

The Study Of HIV/AIDS Prediction And Control Model In Xinjiang Based On Data Mining Technology

Posted on:2021-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:D D TangFull Text:PDF
GTID:1364330602963188Subject:Occupational and Environmental Health
Abstract/Summary:PDF Full Text Request
Objective: To explore the application of data mining technology in the prediction and control of AIDS in Xinjiang.The data mining technology was used to predict the epidemic trend of AIDS,monitor the progress of HIV treatment,identify high-risk groups and analyze high-risk behavior,to provide reference for the prevention and control of AIDS in Xinjiang.Methods: 1)The single ARIMA model and ARIMA-GARCH model were established based on the HIV monthly incidence data in Xinjiang from 2004 to 2016.The data of HIV monthly incidence in Xinjiang from 2004 to 2016 were fitted to evaluate the effectiveness of the model,and the HIV monthly incidence in Xinjiang was predicted for one year.2)The Longitudinal follow-up data of 506 children with HIV/AIDS who received anti-virus treatment in Xinjiang from January 2007 to December 2015 were taken as the research object,the baseline data characteristics,different treatment time points,different grouping characteristics of anti-virus treatment were statistically analyzed to understand the baseline situation,immunological effect,virological effect and growth and development of the study subjects.The single-factor and multiple-factor prediction models were established by using the generalized estimation model for Immunological Index(CD4 cell count)and growth and development index(HAZ and WAZ),respectively.3)The data of sentinel surveillance reports from 2009 to 2015 in three groups of high-risk population(injecting drug users,men who have sex with men,female workers)in Urumqi were used as subjects,including demographic characteristics,sexual behavior and serological test results.Then,with age,marital status,education level as input variables and HIV infection as output variables,four predictive models of three data sets were established.The confusion matrix,accuracy,sensitivity,specificity,precision,recall and area under ROC curve were used to evaluate the performance of the model,and the importance of predictive variables was analyzed.Result: 1)The monthly incidence rate of HIV in Xinjiang from January to December 2017 has been predicted.The results showed that the monthly incidence rate of HIV in Xinjiang from January to December 2017 was decreasing month by month.The precision of ARIMA-GARCH model was higher than that of single model.2)The results of the baseline data of 506 children with HIV/AIDS in Xinjiang showed that 258(50.99%)of them were boys(mean age 7.62 years),The age distribution was>5 years old and the main route of infection was mother-to-child transmission,and the main clinical stages were stage I and stage II.The abnormal rate of CD4 cell count and viral load was 58.89% and 51.28% respectively.The initial treatment regimen consisted of AZT + 3TC + NVP/EFV.The main results were as follows: CD4 cell count,platelets,hemoglobin,total cholesterol,triglyceride,aspartate transaminase,Alanine transaminase,height,weight,HAZ and WAZ increased with the treatment duration;viral load,white blood cells,lymphocyte and Clinical manifestation Opportunistic infection decreased with the treatment duration;blood glucose,serum creatinine and blood urea nitrogen fluctuated with the treatment duration.The CD4 cell count increased by an average of 177/L at treatment duration 1 years,which was 47.58% higher than that before treatment.Viral load dropped from an average of 106500 copies per ml before treatment to 25 copies per ml at treatment duration 1 years,well below the 50 copies per ml threshold for viral load detection.The main results of CD4 cell count showed that there were significant differences in the count of CD4 cell in different age groups before and after treatment,different ART starting age,different baseline CD4 cell count grouping,different HAZ value grouping and different initial treatment regimen(P<0.05).After treatment,compared with before treatment,the CD4 cell count increased more in the age group ≤5 years old than that in the age grou P>5 years old;in the age group ≤5 years old than that in the ART grouP>5 years old than that in the ART grou P>5 years old;in the CD4 cell count group ≥500 at different baseline before and after treatment,the CD4 cell count increased the most after treatment;and in the initial treatment group,the ABC regimen increased the most.The main results showed that there were significant differences in the HAZ values between the groups before and after treatment: sex,age,baseline CD4 count,initial treatment plan,WHO clinical stage and trimethoprim/sulfamethoxazole use(P<0.05).The mean value of HAZ in the girls group was higher than that in the boys group;the mean value of HAZ in the 5 year old group was higher than that in the ≤5 year old group;the mean value of HAZ in the WHO clinical stage III/IV was higher than that in the I/II;and the mean value of HAZ in the trimethoprim/sulfamethoxazole group was higher than that in the non use group.The main results of WAZ before and after treatment showed that there were significant differences in gender,interval between diagnosis and start of ART,baseline CD4 cell count,WHO clinical stage and trimethoprim/ sulfamethoxazole use(P<0.05).The mean value of WAZ was higher in the girls than in the boys,the mean value of WAZ was higher than that in the boys,the mean value of WAZ was higher in the girls than that in the boys,the mean value of WAZ was higher than that in the girls,the mean value of WAZ was higher than that in the boys,the mean value of WAZ was higher in the girls than that in the boys,the mean value of WAZ was higher than that in the girls,and the mean value of WAZ was higher in the trimethoprim/sulfamethoxazole group than that in the non-treatment group.The results of multi-factor GEE model showed that the treatment duration(years)and the baseline CD4 cell count level were the key indicators to influence the immunological effect of children.The results of HAZ and WAZ value multi-factor GEE model showed that the key factors affecting children’s growth and development were the treatment time(years),the age,the starting age of ART and WHO clinical stage.3)The experimental results showed that the optimal prediction result was obtained by the algorithm.The diagnostic accuracy of the MSM dataset was 94.4821%,that of the FSW dataset is 97.5136%,and that of the IDU dataset was 94.6375%.Next was the K nearest neighbor algorithm,the diagnostic accuracy of MSM dataset was 91.5258%,the diagnostic accuracy of FSW dataset was 96.3083%,the diagnostic accuracy of IDU dataset is 90.8287%.The diagnostic accuracy of support vector machine was 94.0182%,98.0369%,and 91.3571%,respectively.decision tree algorithm was the worst among the four algorithms.The diagnostic accuracy of MSM dataset was 79.1761%,the diagnostic accuracy of FSW dataset was 87.0283%,the diagnostic accuracy of IDU was 74.3879%.The importance scores of independent variables in the random forest model showed that age was the most important factor for identifying HIV infection among three high-risk populations in Urumqi.Conclusion: The ARIMA-GARCH model established in the first part of the study can fit the monthly HIV incidence data in Xinjiang,eliminate the ARCH effect of the sample data series,and correct the ARIMA model,the data trend of HIV monthly incidence forecast in Xinjiang is also well preserved.The second part of the study established a generalized estimation model to identify the impact of childhood AIDS patients in Xinjiang immunology and growth and development of the major risk factors,this method overcomes the shortcomings of other methods,which are strict in data requirement and can not analyze the correlation of different measurement indexes at different time points,and can make statistical analysis and inference on the follow-up data of AIDS treatment in Xinjiang children.In the third part,the identification model of the high-risk group of HIV susceptible people can identify diseases accurately according to some important attributes.All three studies show that data mining,as a new method of assisting disease screening and diagnosis,can help medical staff to quickly screen and diagnose AIDS from a large amount of information,monitoring HIV treatment and disease progress,identifying high-risk groups for AIDS prevention and control to provide new technologies and methods.
Keywords/Search Tags:Data mining, HIV/AIDS, Time series, Longitudinal study, Machine learning
PDF Full Text Request
Related items