With the rapid development of technology in the field of drug research,the research on traditional Chinese medicine has also been intensified.Combining modern production technology with traditional Chinese medicine theory,more and more new varieties and new dosage forms of Chinese medicine preparations are being developed to treat related clinical diseases.Following this,the number of reports of adverse drug reactions(ADR)is increasing year by year,and the safety of Chinese medicines has attracted widespread attention at home and abroad.This article aims to use machine learning algorithms to dig out the key factors that may cause adverse reactions of a certain Chinese medicine injection from the extremely unbalanced data,so as to guide the clinical medication and minimize the harm to the human body.The data set used comes from 48 hospitals across the country,and the monitoring data of the inpatients who injected the Chinese medicine injection during the medication cycle.The probability of adverse reactions in this data set is relatively low,accounting for 0.18%of the total sample size,and the ratio of positive and negative samples is unbalanced.Therefore,this article aims to use machine learning algorithms to dig out the key factors that may cause adverse reactions of a certain Chinese medicine injection from the unbalanced data,so as to guide clinical medication and reduce the harm of the drug ADR to the human body as much as possible.This article introduces and uses a variety of resampling methods,such as SMOTE,Easy Ensemble,etc.From the data point of view to weaken the imbalance of positive and negative samples;On the basis of traditional classification models,cost-sensitive algorithms are used to give a small number of samples greater misclassification cost,in this way,the classifier pays more attention to the minority samples;the single-class learning One Class SVM is used to transform the two-class classification problem into a novel point monitoring problem,and the extremely unbalanced data is classified.Using AUC and Kappa coefficients as evaluation indicators to compare the classification effects of different combination methods,it is proved that the classification accuracy of the logistic regression model constructed after random mixed sampling is the highest,which can take into account the identification of balanced positive and negative samples. |