Font Size: a A A

Application Study Of Data Mining In Intelligent Identification Of Metabolic Syndrome In Physical Examination Population

Posted on:2019-03-10Degree:MasterType:Thesis
Country:ChinaCandidate:C YanFull Text:PDF
GTID:2394330548956284Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective: Data mining is widely applied in medicine,but it also faces many problems.Data redundancy and class imbalanced classification are important problems in data mining.In this study,the intelligent screening(classification)of metabolic syndrome as the breakthrough point to discussion and application value of Lasso feature selection and resampling technology in data redundancy and class imbalanced data classification in medical field.Methods: A total of 69,267 Han examination information were collected from a medical institution in Urumqi during 2014~2016.With metabolic syndrome as the classification index,and many physical indicators to predict the classification index of metabolic syndrome.Imbalanced ratio of metabolic syndrome was 1:24,the popular classification methods(C4.5 decision tree and BP neural network)as intelligent classification methods,F-value,G-mean and AUC were used to evaluation performance of classification.The Lasso(Least Absolute Shrinkage Selection Operation)feature selection and three resampling technology(random oversampling,random undersampling and hybrid sampling)were applied to study the metabolic syndrome classification.By compared the difference of classification performance and the stability of classification results,the feasibility of Lasso feature selection and resampling technology in medical data redundancy and class imbalanced data classification were evaluated.Results:(1)Compared with the original examination data,Lasso effectively reduced medical data redundancy and improved classification performance.Physical variables reduced from 53 to 5,they were glucose determination,high density lipoprotein cholesterol,neutrophil percentage,age and mean platelet volume.(2)The computer simulation showed that the imbalanced datasets hindered the performance of data mining,and the classification performance decreased with the increased of imbalanced ratio.The classification performance of metabolic syndrome after resampling technology were better than imbalanced datasets.The classification performances of three resampling techniques were different,random oversampling was the best to improve classification of metabolic syndrome.The classification of C4.5 decision tree and BP neural network were different,the classification of BP neural network was better than C4.5 decision tree.(3)The classification performance of metabolic syndrome was optimized after combing Lasso feature selection with resampling technology,and the C4.5 decision tree has stability.Conclusion:(1)Lasso feature selection could effectively reduce the data redundancy of big physical data and improve classification performance.Data mining was beneficial for finding unknown potential medical indicators and providing references for medical research.(2)Resampling technology could improve the classification performance,and random oversampling could be paid attention to in practical application.Combination of multiple data mining technology had potential application value in medical data mining,information discovery and disease classification.
Keywords/Search Tags:Imbalanced data, Data mining, Lasso feature selection, Resampling technology, Machine learning
PDF Full Text Request
Related items