Research On Text Keyphrase Generation Method Based On Pre-trained Language Model

Posted on:2022-12-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Wang

Full Text:PDF

GTID:2518306746481324

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

The keypharse generation method can generate keyphrases representing the subject and main meaning of a text or a document.Most of the current keyphrase generation methods use the recurrent network structure,which has the problem of long-distance dependence of text,and its sequentiality also excludes the parallelization of training samples.At the same time,there are problems with inaccurate representation of text word embeddings,poor generalization performance,and high training cost.Advanced problems limit the performance improvement of text keyphrase generation.To address these issues,the paper has done the following research:（1）Aiming at the problems of long-distance dependency limitation and inaccurate word embedding representation,a text keyphrase generation model based on XLNet,Score XLNet,is proposed.This is an encoder-decoder framework that leverages the rich semantic features of XLNet trained on massive data to improve the performance of keyphrase generation tasks.The model first uses XLNet to extract important sentences,and then use the title to guide the pre-trained encoder to collect information about each word in important sentences.In addition,a character-level reinforcement learning reward mechanism based on phrase prefix matching is introduced to alleviate the inconsistency between training mode and testing mode.Experiments on five public data sets show that the algorithm can effectively alleviate the current problems and improve the performance of the model.（2）To solve the problem of keyword generation in high cost training and low resource scenarios,a keyword generation model Score XLNet-GAN based on pre-trained language model and generative adversarial networks structure is proposed.Both the generator and discriminator of this model are pre-trained language models.The generator generates a series of keyphrases for the input document,and the discriminator tries to distinguish machine-generated keyphrases from manual labeling keyphrases.At the same time,a new discriminator structure for sequence classification is proposed,which is based on an improved BERT pre-trained language model.Only 1% of the standard training dataset is used for learning,which makes the model work well.

Keywords/Search Tags:

Keyphrase generation, Sentence extract, Pre-trained language model, GAN, RL

PDF Full Text Request

Related items

1	Research On Keyphrase Generation Based On Text Structure Information Enhancement And Pre-Trained Language Model
2	Research On Citation Context Recognition Based On Pre-trained Language Model
3	Research On Automatic Keyphrase Technology In Academic Corpus
4	Design And Implementation Of Text Resource Sharing System Based On Keyphrase Extraction
5	Research On Chinese Text Summary Generation Based On Pre-trained Language Model
6	Research On Few-shot Text Generation With Pre-trained Language Model
7	Research On Question Generation For Tibetan Machine Reading Comprehensio
8	Research On Keyword Extraction Method Based On Semantics Features
9	Research On Keyphrase Generation Method Based On Document Structure Information
10	Research On Abstractive Text Summarization Based On Pre-trained Language Model