Construction Of A Prediction Model For Common Pediatric Diseases Based On Integrated Learning And Unbalanced Multi-label Data Sets

Posted on:2019-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:D X Huo

Full Text:PDF

GTID:2434330563957654

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

For the prediction of the disease,it is often necessary to collect a certain amount of clinical medical records as a data set.The symptom description in the medical records is used as an example feature.The initial diagnosis is used as a disease label,and data mining and machine learning algorithms are used to construct a disease prediction model.However,there is often an imbalance in the sample data in medicine,which leads to the problem of poor model prediction.Given the unbalanced and multi-label nature of medical data sets,this article will use ensemble learning algorithms to build a predictive model of pediatric common diseases.In this paper,the algorithm of combining sampling with AdaBoost and using maximum mutual information to generate trees is used to construct a prediction model of common pediatric diseases.Specifically stated as follows: firstly,the BR data strategy is used to split the experimental data set of common pediatric diseases into the corresponding binary data sets for each label;secondly,for each binary data set,AdaBoost continues to train a small number of training and iterative processes.Reliable samples of the class undergo a certain amount of replication within a specified threshold,thus constituting a predictive model for all individual disease signatures.Finally,the results of all single disease label prediction models are used,and a tree based on the maximum mutual information between the labels is used for prediction.In the prediction,it is necessary to traverse the spanning tree.According to the predicted probability of the node,the predicted probability of the father node and the product of mutual information between the node and the node,the maximum value is selected and updated to the predicted probability of the node.Set an appropriate threshold to add the tag that meets the criteria to the tag's result set.In terms of experiments,the two types of single disease label binary data sets and three unbalanced two-class public data sets were compared using different sampling techniques and single disease label prediction models.The results showed that the model's accuracy rate and recall rate and F1 values ??have improved to varying degrees.In the pediatric common disease experimental data set,the pediatric common disease prediction model is compared with the mainstream multi-label algorithm ML-KNN.Experiments have shown that in three types of evaluation indicators,the model is superior to other algorithms,so the algorithm is very effective for predicting pediatric common diseases constructed on unbalanced multi-label datasets.

Keywords/Search Tags:

Machine learning, Pediatric common diseases, Imbalance, Multiple tags, mutual information

PDF Full Text Request

Related items

1	Research On The Key Problems In Automatic Diagnosis Of Ophthalmic Diseases Based On Machine Learning
2	Research On Recommendation Method Of Traditional Chinere Medicine Prescription Based On Machine Learning
3	Prediction Of Drug-target Interactions Based On Multiinformation Fusion And Machine Learning
4	Breast Mass Detection From The Digitized Mammograms Based On Machine Learning
5	Predicting Interspecies Transmission And Antigenic Relationship Of Influenza A Viruses Based On Machine Learning Methods
6	Machine Learning Methods For B-cell Epitope Prediction Based On Side-chain Information Of Protein
7	The Correlation Between Gut Microbiome And Common Disease And The Establishment Of Constipation Prediction Model
8	Study On Establishment And Application Strategy Of Registies For Pediatric Diseases
9	Application Of Machine Learning Algorithms For The Diagnosis Of Sarcopenia
10	Prediction Of Unenhanced Lesions Evolution In Multiple Sclerosis Using Radiomics-based Models:A Machine Learning Approach