Font Size: a A A

Semantic-based Medical Named Entity Recognition Algorithm And Medical Context Recognition Algorithm

Posted on:2019-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:S H SongFull Text:PDF
GTID:2404330545471539Subject:Engineering
Abstract/Summary:PDF Full Text Request
There are a large number of medical data in the Chinese medical websites.These data have a large amount of medical information.However,due to the large amount of data and low degree of structuring of these web text data,it is difficult to obtain the key information.Therefore,the identification and extraction of key data in the information is of great significance.This paper proposes two medical named entity recognition algorithms about entity names and the relationship between medical entities in the web text data.One is based on dictionaries and the other based on rules.Besides,aiming at the classification of medical context in web texts,a medical context recognition algorithm was proposed.The main work of this article is as follows.1.Web text data and dictionary acquisitionThis article uses the combination of HtmlUnit and JSoup frameworks to develop the crawler software.The medical web texts which are used in this paper are got from the medical website.The texts in each consultation link are obtained according to the department classification.A total of 120,171 web page data were obtained from the medical website.This article also obtains the dictionary from the many medical websites which are selected from ALEXA,the State Food and Drug Administration's official website,and ICD-10 disease resource.The paper ultimately got entity dictionaries including disease dictionary,symptom dictionary,drug dictionary,surgical dictionary,check dictionary,and food dictionary.2.Medical named entity recognition algorithmThis article uses a large number of medical named entities dictionaries obtained by the crawler software,and designs a dictionary-based medical named entity recognition algorithm to identify diseases,symptoms,drugs,surgeries,inspections,and foods.For disease entities,this paper also proposes a rule-based disease named entity recognition algorithm.The algorithm uses semantic laws to obtain feature words.These features are from each part of the start,middle,and end of disease named entities.When the text is retrieved,if a string contains a feature word and satisfies one of the following rules,it is considered as a name of the disease entity.(1)The feature words in the string can be combined into a "start-middle-end" form,and no other word is included in this field.(2)Feature words in the string can be combined into a "middle-start-end" form,and no other word is included in this field.(3)Only end feature is included.Only the "end" form appears.3.Medical context recognition algorithmBased on the entity recognition,this paper designs a medical context recognition algorithm based on text semantics,and divides the medical context into three medical contexts.These three contexts are diagnostic context,therapeutic context and rehabilitation context.By identifying the entity name types and feature words contained in the text,different types of medical contexts are identified.In order to improve the recognition accuracy,the diagnosis context is subdivided into the doctor's diagnosis context and the patient's diagnosis context,and the treatment context is subdivided into a doctor's treatment context and a patient's treatment context.This article uses the dictionary-based entity recognition algorithm to identify medical named entities.The average precision rate is 82.64%,the average recall rate is 67.25%,and the average F-measure is 72.54%.The rule-based disease identification algorithms is designed to identify disease names.The precision rate of this algorithm is 60.42%,the recall rate is 80.99%,and the F-measure is 69.21%.The medical context recognition algorithm is designed to recognize three types of medical contexts: diagnostic context,treatment context,and rehabilitation context.The average precision rate is 77.97%,the average recall rate is 68.54%,and the average F-measure is 72.95%.Compared with experimental data,the dictionary-based entity recognition algorithm and rule-based disease recognition algorithm in this paper have good results and reliability.The medical context recognition algorithm can differentiate complex medical text contexts.
Keywords/Search Tags:Medical Named Entity Recognition, Medical Context Recognition, Web Crawler, Semantic Rules, Feature Words
PDF Full Text Request
Related items