Biomedical Text Mining With Pre-trained Language Model

Posted on:2023-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:S Jiang

Full Text:PDF

GTID:2530307037953539

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,due to the rapid development of medical informatization,the scale of biomedical text data has exploded,and biomedical text mining has become particularly important to mine useful information from the massive biomedical text data to provide powerful support for medical research and clinical decision making.With the success of pre-trained language model BERT,it shows the importance of pre-trained models in the field of natural language processing.In the biomedical field,Bio BERT,Sci BERT and other BERT-based derivative models have been used to obtain biomedical knowledge by pre-training on a large biomedical corpus and have achieved good results in several biomedical text mining tasks.However,it was found that most biomedical pre-trained language models rely on the traditional Masked Language Model(MLM)task.When random masking strategies fail to mask medically relevant terms,the models fail to adequately capture medical contextual semantic relationships.In addition,the lack of Chinese biomedical corpus resources and the complexity and diversity of Chinese medical terms make it difficult for the model to learn Chinese biomedical knowledge.To address the above issues,the main components of this research paper are as follows:1.In this paper,we propose a Chinese biomedical pre-trained language model CMed BERT based on knowledge injection.The model adopts the article structure in medical encyclopedia as a weakly supervised signal,uses the medical terms and their aspects contained in the encyclopedia structure as labels,and trains the model to infer the corresponding medical terms and aspects from the medical description text,so as to capture the biomedical contextual semantic information.This method not only avoids the inability to fully capture medical contextual semantic relations due to the traditional MLM random masking strategy,but also reduces the time and annotation cost spent on manual annotation.2.To further improve the performance of the model,this paper introduces adversarial training in the fine-tuning phase of the downstream task.The model is regularized by adding adversarial perturbations to the word embedding layer,and five adversarial training methods,FGM,PGD,SMART,ALUM and Free LB,are experimentally compared.3.To evaluate the performance of the model on biomedical text mining tasks,this paper conducts experiments on eight Chinese medical information processing tasks provided by the CBLUE benchmark 1.0.Among them,the CMed BERT-adv model based on the FGM adversarial training method improves the average score by 1.8% compared to the four baseline models.

Keywords/Search Tags:

pre-training language model, knowledge-infusion, strategy adversarial training, text mining

PDF Full Text Request

Related items

1	Research On Pre-training Language Model For Biomedical Literature Mining
2	Demand Analysis Of Statistical Professionals Based On Text Mining
3	Researches And Applications Of Molecular Knowledge Graph Construction Based On Pre-training Models
4	Research On Temporal Anomaly Detection Methods Based On Adversarial Training And Frequency Domain Enhanced Attention Mechanism
5	The Analysis Of The Influencing Factors Of The Subject Teaching Knowledge Of The Integrated Technology Of Mathematics Normal Students And The Study Of The Training Model
6	Generalization of isolated word training to connected text: A comparison of generalization strategies
7	Comparison Of Sprint Interval Training,Continuous Training And Blood Flow Restriction On Acute Physiological Characteristics And Effects
8	Research On The Present Situation And Training Strategy Of Pre-service Physics Teachers’ View Of The NOS
9	Research On The Training Model Optimization Of The Development Of New Employees’ Post Competency In Communication Enterprises
10	Comparative Study On Fat Reduction Effect Of High-intensity Intertent Training And Low-strength Continuous Training For Middle-aged Male Members Of Fitness Club