| Coronary Atherosclerotic Heart Disease(CAD)is one of the major diseases that endanger human health,and its rising morbidity and mortality have become urgent problems in the medical community.In recent years,with the continuous improvement of medical informatization,a large amount of descriptive information of the coronary atherosclerotic heart disease patients is stored in the electronic medical record,which provides an important reference basis for predicting and evaluating the patients’ major cardiac adverse events during hospitalization.However,there are relatively few studies on the prediction of major adverse cardiac events in Chinese electronic medical records,and the existing prediction results for adverse events are biased due to the data imbalance in the description of adverse events.In response to the above problems,this paper comprehensively uses natural language processing,deep learning and other technologies,based on the construction of a corpus,to study the prediction methods of major cardiac adverse events,and at the same time to optimize the imbalance of data to provide reference for clinical practice.The main contributions of this paper are:(1)The features of patients were extracted to construct a corpus of major adverse cardiac events.First,the admission records and progress notes of 5,965 patients in a thirdgrade hospital in Xin Jiang,were collected as experimental data,and the unstructured admission records were preprocessed.In combination with the clinical practice,the labeling guidelines applicable to adverse events were formulated and the labeling work was completed.Then,using the method of Bi-LSTM-CRF combined with feature merging strategy,105 patient features in the admission records were extracted and their accuracy was evaluated,and the F value reached 0.914.Finally,four graduate students were used to manually label the adverse events that occurred in the patient’s progress notes.Statistics showed that there were 1,580 patients with adverse events and 4,385 patients with no adverse events.Then the corpus was completed.(2)Aiming at the problem that the prediction effect is biased due to data imbalance,a prediction method combining re-sampling and iterative enhancement framework is proposed.Incorporate weighted sampling into the iterative enhancement framework,use the Borderline SMOTE method to oversample the minority samples,under-sampling was carried out on the majority of the sample data,and the balanced subset was recycled to train the weak classifier,to correct the previous misclassified samples,and to build a strong classifier.In order to comprehensively evaluate the model’s prediction performance,three sets of data sets with different ratios were constructed in the experiment and compared with the three baseline models.The experimental results show that when the data are unbalanced,the evaluation index of minority samples is maintained between 0.690-0.705,and will not decrease with the increase of imbalance.(3)In order to further improve the prediction accuracy,a prediction method for major cardiac adverse events based on CNN-Bi-LSTM-Attention is proposed.In this paper,in order to mine more context information in the features,and avoid the gradient explosion or gradient disappearance problem caused by RNN,the feature is used as the input of the Bi-LSTM neural network after CNN convolution.Then,the attention mechanism is introduced to highlight the key features Influence,reduce interference from irrelevant features.The experiment was conducted on the constructed corpus and Framingham’s publicly released data set.The results show that the prediction accuracy rate is 4.1% higher than that without the introduction of attention mechanism.In addition,after evaluation by clinicians,the top 15 risk factors extracted from this paper that affect major cardiac adverse events have certain guiding significance for clinical diagnosis. |