Font Size: a A A

Research On Few-shot Text Generation With Pre-trained Language Model

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y W SunFull Text:PDF
GTID:2518306572950849Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep neural network has been successfully applied in natural language text generation tasks,and the effects of a series of generation tasks such as machine translation,text summarization and dialogue systems have been qualitatively improved.However,deep learning still faces the key challenge of "data hungry”.Because deep neural networks usually have a large number of parameters,they are prone to overfit and have poor generalization ability when the amount of training data is insufficient.In order to alleviate this problem,it is necessary to invest a lot of effort to manually build a training data set with sufficient and high quality for the corresponding tasks.However,manual annotating of large-scale data is very expensive and time-consuming.It is impractical to provide a sufficient number of training datasets for the corresponding tasks in each field.The limitation of few-shot has greatly hindered the further development and practical application of many text generation tasks.In order to alleviate this dilemma,our research topic proposes to use pre-trained language models to optimize few-shot text generation,and introduce exclusive optimization strategies for three different downstream tasks of text generation.Our research topic includes the following three aspects: the technology of few-shot story generation with pre-trained language model,the technology of few-shot table-to-text generation with pre-trained language model,and the technology of few-shot image captioning with pre-trained language model.The technology of few-shot story generation with pre-trained language model adopts the GPT-2 pre-trained language model as the text generation framework.When finetuning,the loss function is modified to adapt to the conditional generation.Story generation task requires relatively high model logical reasoning ability,but the common sense reserved in GPT-2 is relatively insufficient.In order to solve this problem,we use the COMET model to generate event-level common sense reasoning that may be included in the ending of the story based on the content of the story.After sorting and filtering,we provide GPT-2 with these event-level common sense as the supplement of input.The experimental results show that the technology of story generation with pre-trained language model makes great progress on the few-shot ROCStories dataset compared with the baseline model.The satisfactory results prove the effectiveness of our method.The technology of few-shot table-to-text generation with pre-trained language model also adopts the GPT-2 pre-trained language model as the text generation framework.For the problem that the table format doesn't match the input format of the GPT-2 during pretraining,we propose a template-based table transformation method to encode structured table as a sequence.At the same time,we introduce multi-task learning,and add two extra training goals which include table reconstruction and content matching during fine-tuning.The experimental results show that the technology of table-to-text generation with pretrained language model achieves great results on the few-shot Wikipedia dataset used in this subject.Compared with the latest Seq2 Seq method that also used GPT-2,our method makes a certain progress and reachs a new state-of-the-art,which proves the effectiveness of our method.The technology of few-shot image caption with pre-trained language model takes into account the modal inconsistency between input image and output text.The Oscar model which is initialized based on the BERT pre-trained language model is adopted to alleviate the difficulty of the model learning semantical alignment between image with text.We use the self-critical sequence training(SCST)in the second stage of fine-tuning to solve the exposure bias problem.In addition,we also adopt constrained beam search instead of traditional beam search in the test generation stage,so that the model can ensure that all object tags of image are included in the generation text when it encounters unseen object tags during training.The experimental results show that the technology of image caption with pre-trained language model makes significant progress on the fewshot COCO Image Caption dataset,compared with the latest self-distillation technology with model ensumble,which proves the effectiveness of our method.
Keywords/Search Tags:few-shot, story generation, table-to-text generation, image captioning, pre-trained language model
PDF Full Text Request
Related items