Font Size: a A A

Construction And Research Of Chinese Electronic Medical Record Named Entity Recognition Corpus

Posted on:2021-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y B LiuFull Text:PDF
GTID:2504306038971209Subject:Chinese medicine informatics
Abstract/Summary:PDF Full Text Request
Objective:With the construction of hospital information systems by the country over the years,a large amount of medical data has been input,stored,and continuously called,but the processing of medical data is still a shortcoming.The previous research focused on data mining and data analysis,and less processing and analysis of the text itself.The development of artificial intelligence is changing this phenomenon,which is the main direction of this research.The research of artificial intelligence mainly focuses on natural language processing.Natural language processing includes word segmentation,part-of-speech tagging,semantic analysis and other research directions.This research is based on machine learning techniques such as word segmentation,part-of-speech tagging,and named entity recognition in natural language processing.The research on electronic medical records for desensitization,labeling specification development,labeling corpus construction,automatic labeling model training,etc.aims to explore the application effects of natural language processing technology and neural network cutting-edge algorithms in Chinese medical electronic medical records for The construction of the final knowledge graph and the reserve experience of the realization of intelligent diagnosis and treatment.Method:The research content of this topic is named entity recognition,which is a special phenomenon in the segmentation of natural language processing in classification.It is a kind of manual labeling of proper nouns in a certain industry and then combined with machine learning algorithms to achieve automatic Research for annotation purposes.The research method of this subject is to desensitize the personal information of the patients in the medical records by manual methods during the data preparation stage.At the stage of labeling specification formulation,reference the published labeling specification published in the published literature combined with the research object of this study,and use the labeling tool developed by the research team to conduct trial labeling.After multiple consistency analyses,the labeling specification is determined by modifying and iterating based on the analysis results.After the specification is formulated,the construction of a formal named entity corpus begins.After the corpus construction is completed,the BiLSTM-CRF model is used to train the model,and finally the test set is tested to obtain the result.Result:(1)Data preparation.Through desensitization,the patient’s name,home address,phone number and other personal privacy information not related to this study are concealed,and the text format in the electronic medical record is adjusted to suit the project research.A total of 150 data after cleaning Electronic medical record text document.(2)Formulation of labeling standards.After professional professionals with medical background refer to the naming entity identification and labeling specification developed in this study for labeling,after two iterations,the consistency evaluation shows that the F value is greater than 0.8,and the final draft is finalized,which has developed an electronic medical record named entity suitable for this study.Annotated labeling specifications.(3)Development of annotation tools.Using related programming languages to develop annotation tools,and successfully implemented the functions required by the institute.(4)Labeling of named entities.With reference to the established labeling specifications and the developed labeling tools,100 first-time disease records were labeled,and a named entity labeling corpus was established.(5)Model training.Use BiLSTM-CRF model for model training.(6)Test.Use the trained model to test the test data set.The F value of the test result is 78.41%.Conclusion:The experimental results show that the collected data is not comprehensive enough to cover the medical records of most clinical departments.The labeling specifications formulated meet the experimental objectives.Open annotation tools can meet the needs of the experiment,but there is still room for improvement.After corpus training,it shows that the accuracy of the corpus is good.The test on the test set shows that the trained model performs well,but it can still further improve the accuracy of the data.During the construction of the corpus,the labelers are mainly roommates and classmates.The labelers have imprecise attitudes in the labeling process.There are some errors in the labeling results.Although the consistency evaluation analysis results show that they are qualified,the accuracy rate and The recall data did not reach the extreme.The construction of the entity anticipation database is the top priority of named entity recognition.If you want to improve the accuracy of the test data,you can’t do without the accurately labeled entity corpus.This is one of the reasons why the accuracy of the final test set in this article is not high enough.The same neural network algorithm,combined with the research of others,the model prediction used in this paper can reach the test set accuracy rate close to 0.9,so this research still has a lot of room for improvement.To sum up,this paper attempts to join the research on named entities of traditional Chinese medicine in the electronic medical records of traditional Chinese medicine hospitals based on the recognition of named entities of Chinese electronic medical records,which proves that under the same technical framework,the research on the recognition of named entities of traditional Chinese medicine can still get good results.Experimental results.
Keywords/Search Tags:TCM, CEMRs, labeling standards, named entity recognition, BiLSTM-CRF
PDF Full Text Request
Related items