Font Size: a A A

Research On Medical Entity And Assertion Recognition Based On Active Learning And Semi-supervised Learning

Posted on:2018-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:R Q WangFull Text:PDF
GTID:2334330536981924Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Along with the development of informatization construction in hospital,the number of Chinese electronic medical records is increasing rapidly.The research on the recognition of named entities and assertions in Chinese electronic medical records can establish the foundation to the research on artificial intelligence in medical industry.However,it is insufficient for the field of Chinese electronic medical records with a large amount of labeled data,which is required by traditional supervised learning methods,and it is high-cost to carry out the labeling work on a massive scale.This thesis studies on the recognition of entities and assertions in Chinese electronic medical records,based on the methods of active learning and semi-supervised learning.There are three main aspects of the research,which are shown in the following:(1)The recognition of entities and their assertions in Chinese electronic medical records based on traditional supervised learning methods.We extracted features of text based on a small amount of labeled data,and trained entity recognition models based on the method of Conditional Random Fields(CRFs)and assertion classification models based on the method of support vector machine(SVM).(2)The recognition of entities and assertions in Chinese electronic medical records based on the method of Active Learning.The learning method will select data which is not trained adequately by the existing model to expand the training set before each time of training iteration,in order to obtain a model with higher performance after training using a small amount of labeled data.Traditional Active Learning methods only focus on the informative aspect of the data itself,and ignore whether the data is an outlier of samples.Aiming at solving this problem,the thesis proposes a selection optimization strategy of Active Learning based on the digital characteristics of electronic medical records,to reduce the probability of the selection of outliers.Compared with the previous Active Learning method based on uncertainty,the effectiveness of this proposed method has been improved by experiment.(3)The recognition of entities in Chinese electronic medical records basedon the method of Semi-Supervised Learning.Semi-Supervised Learning is introduced into the field of entity and assertion recognition in Chinese electronic medical records,in order to enhance the effect of the model trained by labeled data using the generous unlabeled data.The Semi-Supervised Learning was carried out based on Tri-Training method,and combined with realities of the situation in the project,to improve the Tri-Training method from the aspects of the construction of initial training set,the construction of classifiers,the selection strategy of voting and so on.Experiments show that the improved method of Semi-Supervised Learning can improve the recognition effect of supervised learning model effectively.
Keywords/Search Tags:Chinese electronic medical records, entity recognition, entity assertion, Active Learning, Semi-Supervised Learning
PDF Full Text Request
Related items