Font Size: a A A

Research And Application Of Data Mining In Massive Health Data Based On Gradient Boosting Algorithm

Posted on:2018-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y LiuFull Text:PDF
GTID:2334330518496119Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The arrival of big health allows us to re-observe the healthy environment of human life and be concerned about the quality of life and physical health. However, a variety of modern diseases are coming to us unknowingly. With the rapid development of big data, data mining and artificial intelligence, especially the proposed of concept of "Internet plus Medical",people can use data mining technology to explore the important information hidden behind big data health, so as to provide a new solution to prevent people from diseases.The rapid and unobvious symptoms of cancer result in a low cure rate, while the most effective way to prevent from cancer is to "detect early and treat early". Therefore, based on the data of health examination(blood routine, urine routine, questionnaire information etc.), this thesis makes use of data mining algorithms such as Gradient Boosting and construct the model of risk screening and early diagnosis of major diseases, and provide the evidence for the diagnosis and treatment of diseases.This thesis studies the Gradient Boosting algorithm mainly, the training method of liver cancer screening model, the implementation and evaluation of parallel computing of gradient lifting algorithm. Firstly, this thesis studies the Gradient Boosting algorithm in ensemble learning, and puts forward the feature selection and sample balance for health data. In the process of unbalance data problem, this thesis uses sampling method based on SMOTE and adjust the verification assessment evaluators;Secondly, for liver cancer screening scenarios, this thesis designs and trains a Gradient Boosting classification algorithm and proposes a complete scheme of model feedback and optimization. Finally, based on the XGBoost platform, this thesis achieves the Gradient Boosting algorithm in the disease risk screening model of the parallel transformation, and verify the experimental results and compare the running time and so on.This thesis verifies that the Gradient Boosting algorithm can achieve the goal of disease screening and early diagnosis of critical diseases in the health examination data, and this algorithm is superior to random forest and logistic regression on the same data set. This thesis also uses the XGBoost platform to achieve the parallel computing which can cope with the needs of massive health data mining effectively. The research results of this thesis have a positive significance for health care and disease prevention.
Keywords/Search Tags:gradient boosting, health data mining, non equilibrium data, tumor screening
PDF Full Text Request
Related items