Font Size: a A A

Analyzing Risk Factors For Multi-Diseases With Data Mining

Posted on:2016-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:L Y MaFull Text:PDF
GTID:2308330482460434Subject:Software engineering
Abstract/Summary:PDF Full Text Request
According to statistic data from the World Health Organization, chronic diseases have the highest mortality all around the world every year. CHD, hypertension, hyperlipidemia and diabetes are four main chronic diseases. Therefore, it is important to research those risk factors for disease prevention, so this paper creates a risk factors analyzing and disease prediction system with data mining methodology. The main work is as follows.Firstly, designing an improved algorithm Knn-DT for C4.5, then finding that Knn-DT outperforms C4.5 with average 2.5% higher F-measure. Secondly, designing an improved algorithm QN-LR for logistic regression, then finding that the convergence speed of QN-LR algorithm is 3.25 times and 8 times faster than DFP and BFGS improved algorithms. Thirdly, designing an improved algorithm MODF-BNN for BP neural network, then finding that the algorithm LM, one of MODF-BNN, shows high F-measure for hypertension which is 90.9%. And its iteration speed is 7 times and 3 times faster than Rprop and BFGS improved algorithms. Additionally, algorithm SCG, another one belongs to MODF-BNN, gets high F-measure values of 91.9%,90.3%,88.5% for hyperlipidemia,diabetes and CHD respectively.Moreover, Knn-DT, MODF-BNN_LM and MODF-BNN_SCG perform well in analyzing risk factors for hypertension; Knn-DT, QN-LR perform well in analyzing risk factors for diabetes; MODF-BNN_SCG perform well in analyzing risk factors for all the four disease.Furthermore, the paper uses the gradient boosted regression trees (GBRT) to build prediction models for the four chronic diseases. And applying random forest (RF) to initialize the input of GBRT to solve the over fitting problem. Then the RF-GBRT method is proved to get better performance than that of GBRT.Finally, the paper builds the so-called "Risk Factors Mining And Diseases Prediction System" with Java programming language.
Keywords/Search Tags:data mining, risk factors, disease predictive model, chronic disease
PDF Full Text Request
Related items