As a valuable data resource,electronic medical records(EMRs)contain a large number of accurate and detailed patient information.The task of named entity recognition(NER)and causal relationship extraction(CRE)for Chinese electronic medical records(CEMRs)can provide theoretical support for the construction of clinical knowledge base and decision support system,etc.The existing methods are not complete enough to capture the text semantics of nested entities,resulting in low accuracy.To solve this problem,this dissertation studies the research on named entity recognition and causal relationship extraction for CEMRs.The accuracy can be improved by deeply mining the semantic information of nested entities and mining the features of radicals.The specific research contents are as follows:(1)There are many nested entities in Chinese EMRs,resulting in low accuracy of named entity recognition.Aiming at this problem,an improved NER model of CEMRs based on CCRFs is proposed.The BERT(Bidirectional Encoder Representation from Transformers)model is utilized to construct the text feature set to obtain rich semantic information.The Bi-LSTM(Bi-directional Long Short-Term Memory)model is used to get local features by processing the feature set.The attention mechanism is used to assign a higher weight to local features related to entities.The CCRFs(Cascaded Conditional Random Fields)model analyzes the local features to avoid errors caused by long-distance dependency.Finally,the NER results of CEMR are finally obtained.The experimental results show that the F1 values of the proposed model in the CCKS 2017 and CCKS 2018 datasets are 92.00 % and 89.31 %,respectively,which are significant compared with the existing models.(2)Furthermore,the special characteristic of CEMRs are highly related to radicals,which leads to low accuracy of causal relationship extraction.Based on the proposed named entity recognition model,a radical-aware causal relationship extraction model oriented CEMRs is further proposed.Firstly,the radicals of named entities are obtained by the Xinhua Dictionary dataset.To deeply capture the semantics of characters,the Word2 Vec model is utilized to extract radical features and the BERT model is used to extract character-level features.Finally,the above two features are concatenated and passed into the extraction model to obtain the extraction results.On the datasets of Q&A2019,Yi Du-S4 K and CHIP2020,the F1 values were 82.1%,81.69%,and 77.99%,respectively.The experimental results show that the model proposed in this dissertation can effectively identify named entities and extract causal relationships among entities,which provides theoretical and technical support for the construction of medical knowledge graph and online diagnosis and treatment platform,etc.Figure [28] Table [15] Reference [82]... |