Font Size: a A A

Research On Keyphrase Generation Based On Text Structure Information Enhancement And Pre-Trained Language Model

Posted on:2024-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y G ZhaiFull Text:PDF
GTID:2568307100988729Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Keyphrase generation aims to generate a group of summative and general words for a given long text,which can express the core content of a given long text.As one of the indispensable components of many tasks in the field of natural language processing,keyphrase generation has attracted much attention in recent years and a variety of generative methods have been developed.This thesis systematically reviews the development of keyphrase generation technology,and finds that there are still some problems such as poor generation results and high complexity of model training.Inspired by the work of previous people,this thesis proposes the following two improvement methods for keyphrase generation:(1)To address the problems of poor quality of keyphrase generation results and omission of emphasis in long texts,this thesis proposes a text keyphrase generation model based on key sentences and title guidance.The model adopts the Seq2 Seq architecture and pays attention to the structural information of text sentence-level and title.Firstly,by learning the implicit expression of sentences,the key sentences which have a positive impact on the generated results are selected,and each word in the sentences is marked with an important mark.Then the encoder and the input text sequence are fused to generate the context representation with important word markers.Finally,the text title information and the vector output by the encoder are matched and aggregated by the attention layer at the decoding end,and the decoder is jointly guided to predict the generated result.The experimental results show that the proposed method solves the above problems well,and the performance is improved on the five public datasets,especially on the large-scale dataset KP20 K.Keyphrase generation model based on key sentences and title guidance compared with the Copy RNN model,the performance of present keyphrase and absent keyphrase was improved by about 18.2%and 31.5% respectively,which proves that the model has better generation effect.(2)To address the problems of high complexity of model training and poor absent keyphrase generation effect,this thesis proposes a multi-task keyphrase generation method based on BART.BART is a pre-trained language model that divides the generation task into present keyphrase extraction and absent keyphrase generation.The extraction task is regarded as a sequence labeling problem,and the key sentences are selected and then fine-tuned to improve the expression ability of the words in the sentence.In addition,the key information vectors in the extraction task are shared with the generation task to enhance the performance of the generative model.The experimental results showed that the model performed well on all five datasets,which is 17.1% higher than the Cat Seq TG-2RF model in KP20 K dataset,and the performance of the absent keyphrase generation results is 9.6% higher than that of the BERT-AKG model trained by the same method,and the pre-trained model also greatly reduces the complexity of model training to a certain extent.
Keywords/Search Tags:keyphrase generation, Seq2Seq, attention mechanism, pre-trained language model
PDF Full Text Request
Related items