Font Size: a A A

Enhancing Pre-trained Language Models For Machine Reading Comprehension

Posted on:2022-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:J J YangFull Text:PDF
GTID:2558307052459124Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Pre-trained language models have proven to be effective for learning contextualized language representation and have been successfully applied to machine reading comprehension(MRC)tasks.However,current approaches only take advantage of the output of the pre-trained language model’s final layer when fine-tuning the downstream MRC tasks.This dissertation argues that only taking single layer’s output restricts the power of pre-trained representation.Thus this dissertation deepens the language representation by absorbing the complementary representation,calculated dynamically by an explicit HIdden Representation Extractor(HIRE),into the output from the final layer.Utilizing Ro BERTa as the backbone encoder,the proposed improvement over the pre-trained language models is shown effective on extractive MRC dataset SQuAD and also helps the model proposed in this dissertation rival with the state-of-the-art models on the natural language understanding benchmark GLUE.Besides,the answer prediction layer based on single-span extraction,which is adopted by the pre-trained language models for extractive MRC,generally suffer from generating incomplete answers or introducing redundant words when applied to the generative MRC.Thus,this dissertation extends the single-span extraction method to multi-span,proposing a new framework which enables generative MRC to be smoothly solved as multispan extraction.Thorough experiments on the challenging multi-passage generative MRC dataset MS MARCO v2.1 demonstrate that MUSST can alleviate the problem of single-span extraction method and produce answers with better-formed syntax and semantics.The proposed enhancement methods can be easily adopted by various pre-trained language models without being restricted by their model architecture or their pre-training process.
Keywords/Search Tags:natural language processing, machine reading comprehension, pre-trained language model, hidden representation, multi-span style extraction
PDF Full Text Request
Related items