Font Size: a A A

Medical Text Information Extraction Based On Deep Learning

Posted on:2020-08-21Degree:MasterType:Thesis
Country:ChinaCandidate:W B TuFull Text:PDF
GTID:2404330590983354Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information extraction from medical texts is an important part of natural language processing in medical field.Its purpose is to mine valuable information from the electronic medical text,and extract and analyze it,in order to help doctors or patients to analyze the disease and achieve medical intelligence.With the improvement of people's living standard and the increasing concern for health,it is a very important work to develop medical information and transform a large number of medical texts into medical knowledge through natural language processing technology.However,how to use natural language processing technology to extract information from Chinese text has the following problems:(1)Because Chinese characters and words have semantics,which is different from English and other languages with words as the minimum language unit.In Chinese natural language processing,there are different requirements for Chinese representation in different scenarios.Some scenarios need to use words as the basic unit,and some scenarios need words as the basic unit.Therefore,in the task of text information extraction,a Chinese word segmentation algorithm with good robustness and adaptability is needed.(2)In medical texts,there are a lot of valuable information,such as "disease name","location of disease","symptoms","drug name" and "treatment means".This is the main content of medical text information extraction.Because medical texts are mostly complex unstructured free texts,and have many descriptions in professional fields,how to extract the above information accurately is the difficulty and key point of medical text information extraction.(3)Recognition of entities in medical texts alone is not enough to meet the demand of medical text information extraction.The attributes of entities are also of great significance,such as the location of the tumors,the location of metastasis after the spread of tumors and the size of tumors.If similar information can be extracted from the electronic medical records of cancer patients,it will bring great help to the diagnosis of doctors and the intelligent processing of electronic medical records.In view of the above problems,the main work of this thesis is divided into the following three points:(1)Considering that the existing word segmentation algorithms can not adapt well to different fields of text,this thesis proposes a non-pooling convolution neural network word segmentation model PCNN based on convolution neural network.The model can efficiently learn the feature correlation information between the vector dimensions of words in training,accurately identify the label categories of words,and thus complete the task of word segmentation.Moreover,the model performs well in medical text segmentation tasks.(2)This thesis takes electronic medical record as the target data of named entity recognition,and combines the characteristics of electronic medical record,proposes a cascaded Bi LSTM + CRF model for named entity recognition of electronic medical record.This thesis holds that in Chinese context,the strokes and Pinyin of Chinese characters also have semantic information,so the design model extracts the features of strokes and Pinyin sequences of Chinese characters through Bi LSTM network,then combines the output strokes and Pinyin features with Chinese character vector,and passes them into another Bi LSTM model,which combines them into cascaded Bi LSTM to get the feature representation of text sequence,and finally CRF tags them.Sequence,and then extract the medical entity.Experiments show that the cascaded Bi LSTM + CRF medical entity recognition model presented in this thesis performs well on public data sets.(3)Entities and their attributes are very valuable semantic units in text data,and are the main work of text information extraction.Extracting entity and entity attributes from unstructured text is the basis of knowledge mining,intelligent retrieval,intelligent question answering and knowledge map construction.After putting forward the medical entity recognition model to recognize medical entity,this thesis proposes a short text classification model.Combining the word segmentation model and the medical entity recognition model,we use the entity context text classification method to extract the entity attributes in electronic cases of cancer patients.The segmentation of entity context uses the result of the word segmentation model as the segmentation criterion.As a continuation of the research on named entity recognition,medical entity attributes extraction is aimed at extracting the two tumor attributes in the electronic medical records of cancer patients: the location of the onset and the location of the metastasis of the tumor.A new model is proposed and the performance of the model is verified by experiments.
Keywords/Search Tags:natural language processing, short text classification, Chinese word segmentation, named entity recognition, entity attribute extraction
PDF Full Text Request
Related items