Font Size: a A A

Research On The Application Of BERT In Text Analysis

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2428330602476503Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development and progress of pre-training technique,it has gradually become a new paradigm in the field of NLP(natural language processing).By using the pre-training language model,we can get a better semantic representation of context and alleviate the phenomenon of polysemy,which can be applied to different tasks by using feature-based or fine-tuning methods.How to apply the pre-training language model according to the characteristics of different tasks to make it suitable for different applications or more complex tasks is the most concerned issue in the application and research of pre-training language models.This thesis discusses the application of pre-training language model BERT(Bidirectional Encoder Representations from Transformers)in three different text analysis tasks,and puts forward a new model according to the characteristics of the task.The main research contents include:(1)This thesis studies the application of BERT in function word usage recognition.The function word usage recognition task can be defined as a single-label classification.Compared with other text classification tasks,the model for function word usage recognition needs to understand the deeper semantic relations in the text.The usage recognition of function words mainly focuses on the common function words "de".The experimental model uses the pattern of BERT in text classification tasks.the results show that BERT is effective in function word usage recognition,and the F1 value is 5.7% higher than the current best result,reaching 88.5%.At the same time,in the finergrained analysis,we can see that BERT effectively alleviates the impact of data imbalance.(2)The application of BERT in assistant diagnosis of obstetrical electronic medical records was studied,and “BERT with Enhanced Layer” was proposed.In this paper,auxiliary diagnosis is regarded as the task of text multi-label classification,and in view of the complexity and long text characteristics of electronic medical records,the use of the original BERT in the text classification task can not effectively cover the information of the whole medical record,so BERT with Enhanced Layer is proposed to enhance the performance of BERT.Enhanced Layer adds an additional structure to BERT to enrich the representation of input medical records,and designs two strategies to represent the enhanced layer,specifically strategy A(attention strategy)and A-AP strategy(attention plus average pooling strategy).The results show that BERT has achieved good results in this task,with a F1 value of 79.58%,while the two strategies in enhanced layer proposed in this paper have been improved by 0.68% and 0.7% respectively on the basis of the strength of BERT.(3)This thesis studies the application of BERT in conversational reading comprehension task and put forward a His BERT(BERT with conversation history)model combined with dialogue history information.By extending BERT,His BERT uses dialogue history as another source of information input,and through interactive encoding and self-encoding with the current question,the model can better handle conversational reading comprehension tasks.On the other hand,this paper also uses the method of adversarial learning to add random disturbance signals in the word embedding layer of His BERT to improve the robustness of the model.Experiments show that,compared with the BERT model,His BERT increases the F1 value by 10.8% to 65.2%,and achieve competitive results.In this thesis,ablation analysis and visualization analysis are used to show the effectiveness of His BERT.
Keywords/Search Tags:BERT, Function Word Usage Recognition, Obstetrics Electronic Medical Record Auxiliary Diagnosis, Conversational Reading Comprehension
PDF Full Text Request
Related items