Font Size: a A A

The Extraction Of Clinical Manifestations And Clinical Events From Outpatient Electrical Medical Records Of Traditional Chinese Medicine

Posted on:2022-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q LiuFull Text:PDF
GTID:1484306341489514Subject:Integrative basis
Abstract/Summary:PDF Full Text Request
BackgroundUnder the wave of informatization,the number of electronic medical records in TCM outpatient clinics is increasing rapidly.However,these outpatient medical record data usually exist in the form of unstructured text.Translating it into clinical practice and medical evidence requires a lot of processing,processing,and summary,and this process is time-consuming and laborious.Since the development of natural language processing technology,there is a rich research foundation for extracting specific information from unstructured text data.However,when a model that is effective in the general field is applied to the specific field of medicine,the existing technology still needs to be related to the field.Adaptability improvement.In the actual TCM clinical data processing work,it is found that the overlapping and interleaved expressions of clinical manifestations and related information extraction in the TCM outpatient medical records are often difficult points in the work of structuring the medical records.These two tasks can usually be solved by the sequence annotation model in natural language processing technology,but the existing methods fail to solve the key problems in detail.ObjectiveThe purpose of this research can be summarized as follows:(1)Taking the clinical manifestations and clinical event extraction in the electronic medical records of TCM outpatient clinics as the starting point,in-depth study of the characteristics of the key clinical information extraction tasks in the field of TCM.(2)Combining the cutting-edge methods of current natural language processing technology,propose a natural language processing method suitable for the characteristics of the electronic medical record text of Chinese medicine outpatient clinics,which is used to extract information from overlapping clinical manifestations and clinical events of Chinese medicine.MethodsThe research first deeply analyzed the clinical manifestations and event characteristics in the electronic medical records of Chinese medicine,organized the whole research under the task framework of sequence labeling,first established the key clinical information extraction framework of the electronic medical records of Chinese medicine outpatient clinics,and set the specific goals of two types of information extraction.Secondly,we organized the construction of two corpora.namely the TCM internal medicine outpatient electronic medical record annotation corpus extracted from the clinical manifestations of Chinese medicine and the TCM treatment polycystic ovary syndrome outpatient electronic medical record annotation corpus extracted from clinical events.Third,according to the characteristics of clinical manifestations in the electronic medical records of Chinese medicine outpatient clinics,a two-stage extraction method for nested.discontinuous,and overlapping clinical manifestations is proposed.and a joint learning model based on a multi-head mechanism is constructed to complete the extraction.task.According to the recording characteristics of clinical events in the electronic medical records of Chinese medicine outpatient clinics,a method for extracting the triad of relationship subject,relationship object and relationship type argument is proposed,and a cascading pointer network is constructed to accomplish this task.Finally.the two labeled corpora were used to conduct data extraction experiments for the two types of methods proposed,and the results were analyzed.The neural network model used in the research is based on the Pytorchl.4 and Tensorflow2.0 frameworks.All methods use accuracy(Precision),recall(Recall)and the harmonic average F1 value of the two for evaluation.ResultsThe results of constructing the electronic medical records corpus of TCM outpatient clinics show that the clinical manifestations in TCM outpatient medical records mostly appear in discontinuous entities and overlapping entities.The clinical records of menstrual events in the gynecological clinics of TCM are frequent and complex,and the descriptions of each event element are in various forms and there is a significant gap in completeness.In terms of extracting the role relationship of event argument,it is typical for the same event element to participate in multiple argument role relationships.The annotated TCM internal medicine outpatient corpus contains 2255 visits and a total of 43143 clinical expressions.More than 63%of clinical manifestations were recorded in a discontinuous form among all types of clinical manifestations.Clinical manifestations composed of independent symptomatic morpheme entities accounted for 10.91%.and clinical manifestations consisting of multiple consecutive entities accounted for 18.51%.More than 43.81%of clinical cases belong to overlapping and staggered manifestations.The marked polycystic ovary syndrome medical record corpus contains 783 diagnoses.a total of 1,487 menstrual events,17,984 event elements,and 14.116 argument role relationships.About 30%of all 17,984 event subjects and arguments participated in forming two or more argument roles,among which nearly 90%of event elements overlapped 1 to 4 times.Among all menstrual event records,the records with the highest complete rate are the date of the first day of menstruation,the duration of menstruation,the use of ovulation induction drugs,and the amount of menstruation.Among them.72.49%recorded the duration of menstruation;66.04%recorded whether the current menstruation was related to medication:60.66%recorded menstrual volume.Among all the argument role relationships.73.8%represent the subordination relationship between the event subject and event elements,and 26.2%represent the supplementary explanation relationship between menstrual-related information and descriptive words.The clinical manifestations extraction experiment results show that the multi-head mechanism joint-learning model using the label embedding strategy performed best.The recall is 81.83%,the F1 is 82.16%.The precision of the pipeline model using the label embedding strategy reaches 84.96%.BERT using TCM clinical corpus pre-training improves the pipeline method and joint-learning method:the F1 of the pipeline extraction model using TCM semantic BERT is 72.22%;the F1 value of the joint-learning model using TCM semantic BERT is 80.16%.The label embedding strategy has improved the pipeline method more significantly.After the label embedding strategy was adopted,the precision of the pipeline method was increased from 74.96%to 84.96%.F1 increased from 72.22%to 78.51%.After adopting the tag embedding strategy,the F1 value of the joint learning model using TCM semantic BERT as the language representation increased from 81.02%to 82.16%.The clinical event extraction experiment results show that the cascaded pointer network extraction model using RoBERTa as the language representation model achieves the best extraction effect when adopting the predictive gradient descent method for adversarial training.The model achieved 89.71%precision,92.42%recall,and the 91.05%F1 in the subtask of menstrual event element extraction.In the event extraction final task,the accuracy rate was 76.11%,the recall rate was 79.5 1%,and the F1 value was 77.78%.Compared with the random selection strategy,the accuracy of the BERT model trained with the whole word mask is increased by 3.31%to 88.62%.the recall rate is increased by 0.5%.to 92.27%,and the F1 is increased.It's 1.28%.reaching 90.41%.The cascading pointer network that uses RoBERTa to express language features has improved accuracy by 2.09%and achieved the highest F1 of 74.34%.Using the predictive gradient descent method,the precision of the three language representation models increased by 1.16%.1.09%,and 2.86%.respectively.The RoBERTa model trained with the predicted gradient descent method achieved 91.5%of the best event element extraction effect,which was 1.75%higher than the baseline method.The fast gradient method improves the recall rate of BERT-wwm-ext by up to 8.42%.Overall,the RoBERTa model with the addition of the predicted gradient descent method achieved the highest F1 of 77.78%on the experimental data set.The menstrual event extraction experiment in the TCM gynecological outpatient medical record shows that the bidirectional encoding Transformer for the full-word mask training task that expands the pre-training expected scale has a higher extraction efficiency than the BERT-base in all tasks.The cascading expansion strategy is better than the random selection strategy in all tasks.Different language representation models have inconsistent sensitivity to disturbance strategies.but overall,adversarial training strategies significantly improve various models.ConclusionThe two-stage extraction method designed in this study can effectively complete the extraction of clinical manifestations in the outpatient medical records of TCM internal medicine.The constructed multi-head mechanism model is superior to the baseline method in identifying overlapping relationships,and can automatically capture semantics without any help from external NLP tools.feature.It is more suitable for the extraction of clinical manifestations entities in the electronic medical records of TCM internal medicine outpatient clinics.The triad extraction strategy adopted by the research is ideal for the extraction of menstrual events in TCM gynecological outpatient clinics.The constructed joint-learning model based on a cascading pointer network is suitable for structuring TCM gynecological outpatient medical records.
Keywords/Search Tags:Outpatient electronic medical records, clinical manifestations, clinical events, information extraction, deep neural networks
PDF Full Text Request
Related items