| In recent years,as the country and people pay more and more attention to medical and health,many enterprises and research units have begun to go deep into the field of intelligent medicine and health,and jointly promote the development of intelligent medical products such as intelligent diagnosis and treatment,medical Q&A and clinical auxiliary decision-making.In the medical scene,the number of medical texts such as medical literature,clinical test reports,electronic medical records and medical insurance records has increased significantly.They contain a large amount of information that can be mined and used.Most of these medical texts are unstructured or semi-structured,which is not conducive to the analysis and utilization of data.How to better analyze and utilize these data is the current research hotspot and difficulty.Term extraction and term alignment are the key components of medical data analysis and utilization.It can assist medical related personnel to analyze medical information.At the same time,it can also provide favorable support for the accuracy of downstream AI medical products at the bottom.In order to better analyze and utilize medical data,and also consider the practical needs of medical intelligent products to improve the effect.This paper designs and implements a medical domain oriented term extraction and alignment platform.The main research work of this paper is as follows:(1)According to the characteristics of entity terms in the medical field,the named entity recognition model in the open field is fully investigated,and the term extraction of medical electronic medical record is carried out by using the method of deep learning.At the same time,the named entity recognition model based on character level bilstm-crf is used to extract 15 kinds of medical entity terms with high frequency in Chinese electronic medical records.Considering the factors such as prediction time cost,comprehensive accuracy of extraction of various types of medical entities,model space cost and so on,the term extraction effects of BERT-CRF and BERT-Bi LSTM-CRF models are compared,and finally the character level Bi LSTM-CRF model used in this platform is determined.(2)Aiming at the problems of specialization,colloquialism and diversification of medical terms,which are difficult to align directly,the term standardization algorithm is used and transformed into a two-stage task of recall and sorting to align the terms of "disease" type.Because the traditional single recall method can not completely recall the correct candidate concept terms,this paper proposes a Multi Strategy recall scheme,which greatly improves the recall rate.At the same time,in order to better sort the candidate terms,the implicit semantic ranking models based on Siamese-LSTM,Bert base,Bert-wwm-ext,Roberta-base and Roberta-wwm-ext are constructed respectively.At the same time,in order to harvest the better effect of the semantic ranking model,the data construction strategies of positive and negative samples are fully compared,and the difficult negative samples are introduced.The effect of the semantic ranking model is greatly improved,and the final accuracy of the term standardization model is 0.865.(3)At present,most medical information platforms use named entity recognition systems more.Although unstructured medical information can be transformed into structural information to a certain extent,due to the characteristics of diversified and colloquial expression of medical terms,errors will be introduced in the process of analysis and utilization.Therefore,based on the existing problems,a term extraction and alignment platform for the medical field is constructed.The platform uses frameworks and technologies including Vue,flask and MYSQL to realize the functions of file upload,file analysis,medical term extraction,medical term alignment and visual display.After strict testing,the functions of the platform can pass smoothly.The results show that the term extraction and alignment platform in the medical field constructed in this paper is effective and practical. |