| In recent years,the disease burden of ischemic stroke is becoming more and more serious in our country.The etiology classification of ischemic stroke is very important for the treatment of ischemic stroke and the related clinical research.In order to alleviate the burden of clinicians,and improve the accuracy and consistency of etiology classification,the automatic etiology classification of ischemic stroke emerge as the times require.However,the existing automatic classification model of ischemic stroke etiology has some shortcomings,such as inconsistent classification standard,inconvenient operation,poor interpretability,and poor ability to process text data.The development of computer-interpretable guideline and natural language processing technology provide a new opportunity to improve the automatic classification model of ischemic stroke etiology.The main contents of this study are as follows:1.Extracting diagnosis and treatment process fom clinical guideline.In this study,a graphical representation method of diagnosis and treatment process is proposed to define different types of semantic blocks in the clinical guideline,which can realize the graphical modeling operation.Taking the extraction of the ischemic stroke’s etiology classification process as an example,we explained how to extract the diagnosis and treatment process from the textual clinical guideline in detail.2.Representing ischemic stroke’s etiology classification guidelines.The computerized clinical guideline model called GDL2 which proposed by openEHR was used to represent the clinical knowledge and reasoning logic in the guideline of ischemic stroke’s etiology classification,and the computer-interpretable guideline of ischemic stroke’s etiology classification was constructed.At the same time,virtual data was used to verify the logical rationality and operability of the computer-interpretable guideline.3.Structuring EMR data of patients with ischemic stroke.According to the characteristics of ischemic stroke’s etiology classification computer-interpretable guideline,it is necessary to obtain structured data from patients’ electronic medical record data.This paper focuses on the research of structuring the text data in the EMR,including the identification of cardiogenic embolic source from echocardiography and the identification of DWI high signal location from the MRI report of patients’ head.The named entity recognition model of BERT-LSTM-CRF and the relation extraction model based on machine reading comprehension are used to realize the structured EMR text data.4.Evlauating the performance of ischemic stroke etiology classification model.Accuracy,Recall and F1 score were used to evaluate the NLP model.The F1 value of cardiac embolism source recognition was 93.99%,and the DWI high signal location recognition was 91.39%.In addition,by obtaining the case report data,the data set of ischemic stroke’s etiology classification evaluation was constructed,and the data set was evaluated.At last,this study also compared the model of ischemic stroke’s etiology classification based on guideline knowledge representation with other existing models.In conclusion,based on the computer-interpretable guideline model and natural language processing technology,this study constructs an automatic etiology classification model of ischemic stroke based on guideline knowledge representation.In this study,we used natural language processing technology to process the real world EMR data and mapped them to clinical guidelines,and the etiology classification model proposed in this study is verified on the patient data set constructed by case report.This study provides technical support for the construction of decision support system combining clinical guidelines and medical record data. |