| With the large-scale application of Internet technology in the medical industry and the rapid development of medical data,mining the hidden value of medical big data has naturally become an important research topic.With the improvement of the quality of life of Chinese residents,there are more and more people undergoing physical examinations.Faced with a large amount of physical examination data,it is a tedious task for doctors to slowly diagnose diseases based on the data information.Therefore,machine learning Data clustering and disease classification can improve the efficiency of doctors’ diagnosis.Among the many diseases detected by physical examination,chronic diseases occupies a dominant position.Therefore,digging out the influencing factors of chronic diseases,understanding the impact of these factors on the disease,and effectively controlling and preventing them,has practical significance for saving medical resources and reducing family burdens.This article summarizes the basis of previous studies,based on HIS data in a certain place,comprehensively applying machine learning and visualization methods,and building an algorithm framework and a visual analysis system to analyze diseases.The main research work is as follows:(1)Cluster analysis of physical examination data: By comparing the clustering effect of K-means and Gaussian mixture model in physical examination data,Gaussian mixture model is selected for cluster analysis of data,and then the violin chart is used to explore the distribution of data indicators in each cluster.According to the distribution of these indicators and combined with relevant medical knowledge,the physical examination data is divided into two categories(normal physical examination and abnormal physical examination).Finally,the effect of the clustering model is verified by the evaluation index of the physical examination data,which proves that the Gaussian mixture model has a good clustering effect on the disease data.This method can help doctors quickly focus on abnormal data from a large amount of physical examination data,and improve the efficiency of disease diagnosis.(2)Chronic disease classification research: In order to classify chronic diseases,this article uses hypertension and diabetes data,selects 11 physiological indicators and 16 blood chemistry indicators and analyzes their feature importance for chronic disease classification,and selects 12 according to the feature importance.These indicators construct a decision tree model to classify chronic diseases.And through the model’s accuracy,recall rate and confusion matrix evaluation model,the model accuracy rate reached 88%,and the constructed decision tree has a good effect on the classification of chronic diseases with hypertension and diabetes.(3)Diabetes regression prediction: Designed and implemented a visual analysis system for diabetes evolution and prediction based on the living habits of diabetic patients.Fasting blood glucose was divided into four levels: normal,mild,moderate,and severe.The blood glucose evolution map was constructed to display unit time.The evolution of blood glucose within the unit and analyze the extent of the effect of the patient’s living habits on blood glucose per unit time.In order to further analyze the degree of influence,different machine learning algorithms are used to predict blood glucose changes under current lifestyle habits.Design visual graphics to provide multi-views,multi-angle display and interactive analysis to explore the differences in living habits of different blood sugar changes,and analyze and evaluate the predictive effects of various machine learning models.It is convenient for diabetic patients to find the lack of living habits,improve living habits,and reduce the possibility of serious blood sugar.Finally,a case study was made using chronic disease follow-up data,which proved the effectiveness of the system in predicting and evolving diabetes. |