Font Size: a A A

Construction Of A Prediction Model For Common Pediatric Diseases Based On Integrated Learning And Unbalanced Multi-label Data Sets

Posted on:2019-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:D X HuoFull Text:PDF
GTID:2434330563957654Subject:Computer technology
Abstract/Summary:PDF Full Text Request
For the prediction of the disease,it is often necessary to collect a certain amount of clinical medical records as a data set.The symptom description in the medical records is used as an example feature.The initial diagnosis is used as a disease label,and data mining and machine learning algorithms are used to construct a disease prediction model.However,there is often an imbalance in the sample data in medicine,which leads to the problem of poor model prediction.Given the unbalanced and multi-label nature of medical data sets,this article will use ensemble learning algorithms to build a predictive model of pediatric common diseases.In this paper,the algorithm of combining sampling with AdaBoost and using maximum mutual information to generate trees is used to construct a prediction model of common pediatric diseases.Specifically stated as follows: firstly,the BR data strategy is used to split the experimental data set of common pediatric diseases into the corresponding binary data sets for each label;secondly,for each binary data set,AdaBoost continues to train a small number of training and iterative processes.Reliable samples of the class undergo a certain amount of replication within a specified threshold,thus constituting a predictive model for all individual disease signatures.Finally,the results of all single disease label prediction models are used,and a tree based on the maximum mutual information between the labels is used for prediction.In the prediction,it is necessary to traverse the spanning tree.According to the predicted probability of the node,the predicted probability of the father node and the product of mutual information between the node and the node,the maximum value is selected and updated to the predicted probability of the node.Set an appropriate threshold to add the tag that meets the criteria to the tag's result set.In terms of experiments,the two types of single disease label binary data sets and three unbalanced two-class public data sets were compared using different sampling techniques and single disease label prediction models.The results showed that the model's accuracy rate and recall rate and F1 values ??have improved to varying degrees.In the pediatric common disease experimental data set,the pediatric common disease prediction model is compared with the mainstream multi-label algorithm ML-KNN.Experiments have shown that in three types of evaluation indicators,the model is superior to other algorithms,so the algorithm is very effective for predicting pediatric common diseases constructed on unbalanced multi-label datasets.
Keywords/Search Tags:Machine learning, Pediatric common diseases, Imbalance, Multiple tags, mutual information
PDF Full Text Request
Related items