Research On Keyphrase Generation Based On Text Structure Information Enhancement And Pre-Trained Language Model

Posted on:2024-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y G Zhai

Full Text:PDF

GTID:2568307100988729

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Keyphrase generation aims to generate a group of summative and general words for a given long text,which can express the core content of a given long text.As one of the indispensable components of many tasks in the field of natural language processing,keyphrase generation has attracted much attention in recent years and a variety of generative methods have been developed.This thesis systematically reviews the development of keyphrase generation technology,and finds that there are still some problems such as poor generation results and high complexity of model training.Inspired by the work of previous people,this thesis proposes the following two improvement methods for keyphrase generation:(1)To address the problems of poor quality of keyphrase generation results and omission of emphasis in long texts,this thesis proposes a text keyphrase generation model based on key sentences and title guidance.The model adopts the Seq2 Seq architecture and pays attention to the structural information of text sentence-level and title.Firstly,by learning the implicit expression of sentences,the key sentences which have a positive impact on the generated results are selected,and each word in the sentences is marked with an important mark.Then the encoder and the input text sequence are fused to generate the context representation with important word markers.Finally,the text title information and the vector output by the encoder are matched and aggregated by the attention layer at the decoding end,and the decoder is jointly guided to predict the generated result.The experimental results show that the proposed method solves the above problems well,and the performance is improved on the five public datasets,especially on the large-scale dataset KP20 K.Keyphrase generation model based on key sentences and title guidance compared with the Copy RNN model,the performance of present keyphrase and absent keyphrase was improved by about 18.2%and 31.5% respectively,which proves that the model has better generation effect.(2)To address the problems of high complexity of model training and poor absent keyphrase generation effect,this thesis proposes a multi-task keyphrase generation method based on BART.BART is a pre-trained language model that divides the generation task into present keyphrase extraction and absent keyphrase generation.The extraction task is regarded as a sequence labeling problem,and the key sentences are selected and then fine-tuned to improve the expression ability of the words in the sentence.In addition,the key information vectors in the extraction task are shared with the generation task to enhance the performance of the generative model.The experimental results showed that the model performed well on all five datasets,which is 17.1% higher than the Cat Seq TG-2RF model in KP20 K dataset,and the performance of the absent keyphrase generation results is 9.6% higher than that of the BERT-AKG model trained by the same method,and the pre-trained model also greatly reduces the complexity of model training to a certain extent.

Keywords/Search Tags:

keyphrase generation, Seq2Seq, attention mechanism, pre-trained language model

PDF Full Text Request

Related items

1	Research On Text Keyphrase Generation Method Based On Pre-trained Language Model
2	Research On Chinese Text Summary Generation Based On Pre-trained Language Model
3	Research On Text Keyword Extraction Method Based On Multi Granularity Attention And Pre Training Model
4	Research On Automatic Keyphrase Technology In Academic Corpus
5	Research On Question Generation Method Based On Seq2seq Architecture
6	Research On Natural Language Generation Methods Based On Neural Sequence Learning
7	Design And Implementation Of Text Resource Sharing System Based On Keyphrase Extraction
8	Text Keyphrase Generation Method Based On Deep Learning
9	Research On Few-shot Text Generation With Pre-trained Language Model
10	Research On Sentiment Analysis Of Self-attention Mechanism Based On Pre-trained Language Model