Font Size: a A A

Research On BERT-based Machine Reading Comprehension Method

Posted on:2022-12-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z X DingFull Text:PDF
GTID:2518306776460574Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the popularity of large pre-trained models in natural language processing,the paradigm of "pre-training and fine-tuning" is increasingly used in downstream tasks,and fine-tuning a pre-trained model can achieve good results.While bringing great convenience to scientific research,the pre-training model develops in a larger and more general direction.Meanwhile,it also places higher requirements on the hardware environment,making it very challenging to train or deploy models in some edge devices and devices with limited computing power.And in machine reading comprehension tasks,it is very difficult to collect a large amount of labeled data for model training.For the problem of scarce datasets,a prompt learning method using the " pre-training,prompt and prediction " paradigm has gradually become a research hotspot.Prompt learning transforms downstream tasks into pre-training tasks,so that the model can make full use of the knowledge acquired in pre-training and improve the performance.The BERT model is a popular pre-training model that can dynamically represent words through contextual content,and has achieved excellent results in machine reading comprehension tasks such as text classification,machine translation,and question answering systems.However,this model also has the problems of large model size and slow inference speed.Based on the in-depth study of the BERT model,this thesis compresses and optimizes it,and explores the performance of the prompt learning method according to the characteristics of this model.The specific work is as follows:(1)For the large amount of parameters,slow training speed and difficult deployment,this thesis proposes a way to compress the BERT model based on knowledge distillation.In the process of knowledge distillation,the loss function is formulated according to the soft target and hard target of distillation.At the same time,this thesis optimizes the structure of the model by removing the sentence embedding in the embedding layer and changing the position embedding to sinusoidal position encoding.And a hidden layer scheduling strategy is proposed to alleviate the performance loss caused by the reduction of parameters.This thesis optimizes the processing of data in the pre-training task to increase the probability of low-frequency words being masked.Finally,the comparative experiments show that the improved BERT model can reduce the number of parameters by 52.1% and increase the inference speed by 25% with a small performance loss.(2)For the difficulty of obtaining a large amount of labeled data,this thesis explores a BERT-based prompt learning method to test its effect on sentiment analysis tasks.First,the task dataset used is analyzed to find the frequency of each word in different sentiment polarities.Then,according to the statistical results,a prompt function and an answer space are formulated to transform the downstream task into a pre-training task.Finally,the experimental results show that the BERT-based prompt learning method can achieve an accuracy of 67% in the zero-shot case,and can also improve the performance of the model to varying degrees in the few-shot case.
Keywords/Search Tags:Machine Reading Comprehension, Knowledge Distillation, Prompt Learning
PDF Full Text Request
Related items