Font Size: a A A

Research On Medical Entity Extraction Based On SVM And CRF

Posted on:2019-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2428330548477639Subject:Electronic and communication engineering
Abstract/Summary:
Due to the vigorous development of Artificial Intelligence,many fields have begun to conduct in-depth research and development in conjunction with machine learning methods.Among them,in the Information and Signal Processing,the development of Language Text Information Processing that is natural language processing is particularly prominent.Entity extraction,as one of the basic research directions of Language Text Information Processing,is an important basic technology for information processing such as information extraction,understanding of text information,text data mining and so on.In order to make better use of these electronic text information,and further promote the development of Internet medical care,the application of entity extraction technology to Internet medical treatment has great research value.This article focuses on the medical disease entity extraction technology,which is to mark and classify the diseases,symptoms and drug names in the electronic medical texts,so as to obtain a more effective disease entity extraction model and further improve the extraction performance of disease entities.This article selects the support vector machine(SVM)and conditional random field(CRF)as a disease entity extraction model,aiming at the establishment of the corpus in entity extraction technology,data preprocessing and feature set selection and extraction,entity extraction model of training as well as the result of the entity extraction from four aspects,disease entity extraction technology research and application.The main work of this paper is as follows:(1)In building the corpus of medical disease.Since the establishment of electronic medical corpus is still in its infancy in China,there is no mature corpus available for the study of this article.This article obtains the data of interviews between doctors and patients through the Internet website 120 ask,and as the original corpus,the text data structure characteristics of the original medical consultation were then de-redundant and standardized to obtain key information containing only the patient's symptoms and doctor's diagnosis,and the corpus needed for this study was established.(2)In data preprocessing and feature set selection and extraction.When selecting feature sets of disease entities,according to the characteristics of Chinese linguistics and the characteristics of the disease entities,five features with higher frequency of use in the Chinese entity extraction study were selected as feature sets for entity extraction in this article.,mainly including language character features,part-of-speech,lexical,contextual,and word formation features.After the feature set is determined,feature extraction is performed on the experimental corpora and the characteristics of the corpora are marked.(3)In constructing SVM and CRF disease entity extraction models.In this paper,the training model of the physical model is used,and two supervised learning models of SVM and CRF are selected.Before training the model,the experimental corpus is first divided into a training set and a test set,and then the characteristics of the annotated corpus are normalized.Finally,the standardization processing is performed for each model requiring different formats of the corpus data.The training corpus data were put into the model training,parameter and model tuning,SVM and CRF disease entity extraction model was established.(4)In the assessment of the performance of the extracted entity model.In the experiments of disease entity extraction,three control experiments were mainly set up in this paper.They are as follows: the effect of training corpora on the extraction effect of training model disease entity;compare the performance of CRF and SVM entity extraction model in the extraction of disease entity;compare the influence of part of speech and morphological features on the extraction effect of CRF disease entity model.The experimental results show that: within a certain amount of training corpus,the amount of training corpus is directly proportional to the performance of the training model disease entity;CRF disease entity extraction model is more suitable for disease entity extraction than SVM model performance;feature collection has a great influence on the extraction model of disease entity.The disease entity extraction technology studied in this paper has important implications for the construction of the corresponding model of disease and symptoms and the advancement of Internet Medical.
Keywords/Search Tags:Entity Extraction, Medical entity, Conditional Random Fields, Support Vector Machine
Related items