Font Size: a A A

Research On Multi-label Classification Algorithm Of Chronic Disease Based On Association Analysis And Ensemble Learning

Posted on:2022-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:J Q HouFull Text:PDF
GTID:2494306551956639Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Chronic noncommunicable diseases have become the greatest threat to human health in recent years.Chronic diseases are difficult to determine the time of disease onset because of their complex pathogenesis.Early screening and diagnosis of chronic diseases are recognized as an effective way to reduce the risk of chronic diseases.As the health awareness of Chinese citizens has increased,the number of people who regularly undergo medical checkups has greatly increased,and along with the construction of medical information technology,most medical institutions already have a large amount of medical data.Under this premise,the gradually mature artificial intelligence technology provides a powerful tool for early screening and diagnosis of chronic diseases.In recent years,studies on various types of chronic have gradually increased.In this thesis,we took medical examination data as the research object,addressed the problems of insufficient consideration of disease relationships and single prediction results in existing chronic disease studies,conduct in-depth analysis of 10 chronic diseases such as obesity,fatty liver and diabetes,proposed a multi-label prediction model for various chronic diseases,and evaluated and validated the performance on the dataset.The main research contents and results of this thesis contain the following:(1)Diagnostic text-based disease extraction model.Due to improper operation,negligence and other factors,the same disease is described differently in the dataset.We proposed a Word2Vec-based disease label extraction model to solve this problem.A large medical corpus was collected and used to train a medical word vector model to implement the disease extraction model.(2)Multi-disease association rule mining based on clustering analysis.K-Means and DBSCAN clustering methods were used to cluster 64 diseases respectively,and then the FPgrowth algorithm was used to mine the association rules for the diseases in each cluster.Finally,we analyzed the association relationship between different diseases.(3)Bagging-based multi-label classifications embedding model.There are many dependencies between various chronic diseases,and most patients suffer from multiple chronic diseases at the same time.Most of the existing studies are focused on single disease prediction and do not fully consider the association relationship between different chronic diseases.Therefore,this thesis used multi-label learning algorithms for simultaneous prediction of multiple diseases.Firstly,we proposed a multi-label neural network model(NL-NN)to predict diseases,and in order to further improve the performance of our model,the Bagging-based multi-label classifications embedding model(BMCE)is proposed.Based on the ML-NN proposed in this thesis,and then combined with two classical multi-label algorithms,the three algorithms are initial integrated by bagging method respectively,and finally,the results of the three Bagging models are embedding by stacking to form the final multi-label prediction model.After experimental verification,the ML-NN proposed in this thesis has better performance compared with other multi-label models,while BMCE further improves the prediction performance,which is better than common multi-label models.Finally,the experiments demonstrate that combining BMCE with disease association rules can improve the model performance again.
Keywords/Search Tags:chronic noncommunicable diseases, disease prediction, disease association rule, multi-label learning, data mining
PDF Full Text Request
Related items