| Electronic medical records are detailed texts of medical activities and contain a great deal of valuable clinical medical knowledge.Chinese medical entity recognition can extract medical information from the electronic medical record,laying the foundation for medical quality management,intelligent medical treatment and other researches.Traditional supervised learning methods tend to rely on a larger volume of training data,but the cost of building a large-scale Chinese electronic medical record corpus is high.Therefore,this paper uses active learning to conduct research on Chinese medical entity recognition,which can solve the problem of insufficient training data to some extent.The research in this paper covers the following three main areas.(1)Chinese medical entity recognition based on conditional random fields.After analyzing the linguistic features of the electronic medical record,we optimize the CRF(Conditional Random Fields)algorithm in the feature extraction part by extracting the word-end features,in addition to the regular contextual features and character features.In this paper,a small-scale annotated corpus of 300 electronic medical records is constructed with reference to the I2B2 2010 annotation specification.The model is trained using a self-built corpus,and the results show that the F1 score of the improved CRF algorithm reached 0.933,which was 0.6% higher than that of the traditional CRF algorithm,while the F1 score of the improved CRF algorithm was more than 10% higher than that of other classical algorithms(Hidden Markov Model,Bidirectional Long Short-Term Memory),proving that the improved CRF algorithm proposed in this paper is better.(2)Chinese medical entity recognition based on active learning.Active learning improves model performance through iterative training,with targeted selection of training samples.This paper uses pool-based sampling to construct active learning query scenarios and proposes an uncertainty-based sampling strategy.We Use a small data set to train the model.The result shows that the F1 score of the entity recognition model based on active learning was 3% higher than the random sampling model,demonstrating that active learning can be effective in improving model performance in the condition of limited training data.(3)Design and implementation of the Chinese electronic medical record management and analysis system.In addition to basic functions such as storage and management,the system also provides structured display and online annotation.It not only meets the needs of researchers for data analysis of electronic medical records,but also facilitates the construction of annotated corpus. |