Font Size: a A A

Text Keyphrase Generation Method Based On Deep Learning

Posted on:2021-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2428330611468933Subject:Aeronautical Engineering
Abstract/Summary:PDF Full Text Request
The keyphrase is the refinement of text topic information,which can help people quickly obtain the core content of the article,and is widely used in information retrieval,question answering system,text classification and other fields.Compared with traditional keyphrase extraction methods,deep learning-based keyphrase generation methods can not only generate keyphrases that have not appeared in the original text,but also learn the underlying semantic information of the keyphrases in the document.Therefore,this research mainly focuses on the application of deep learning in keyphrase generation,and proposes multiple improved algorithms.The contributions of this work are listed as follows:(1)Explore the keyphrase generation algorithm based on sequence-to-sequence(Seq2Seq)framework.The Seq2 Seq framework incorporates the attention mechanism and the copying mechanism to implement the keyphrase generation model CopyRNN.By comparing the test results of the six classic extraction algorithms and CopyRNN on five data sets,it is proved that the performance of the generation algorithm is greatly improved compared to the extraction algorithm.(2)Discover and analyze the deficiency of CopyRNN that it generates overlapping phrases,and propose a keyphrase generation algorithm(ParaNet)based on parallel deep learning networks for this deficiency.The algorithm structure is more complex,including parallel encoders and parallel decoders.The two encoders in parallel encode the text sequence and its corresponding syntactic labels into the network independently.The parallel decoder uses a multi-tasking framework,which enables the model to jointly learn the word decoding task and the syntactic label decoding task.Experimental results prove that ParaNet can not only greatly improve performance compared to CopyRNN,but also alleviate the problem of generating overlapping phrases.In addition,cross-domain test proves that ParaNet can learn the common features between semantics and syntax,and has good generalization ability.(3)Discover and in-depth study the problems of CopyRNN that it complicates present keyphrase generation and weakens absent keyphrase generation,and propose an easy-to-hard learning paradigm for these problems.A hierarchical decoding network(H-Net)fused with coverage vector is used to implement the learning strategy.The hierarchical network consists of an easy decoder at the lower layer and a hard decoder at the upper layer.Comprehensive experimental research shows that the hierarchical network model has better performance in keyphrase generation task than the latest keyphrase generation methods.In addition,a keyphrase-guided title generation verifies the generalization and effectiveness of easy-to-hard learning strategy in other generation tasks.
Keywords/Search Tags:keyphrase generation, deep learning, Seq2Seq, parallel network, hierarchical network
PDF Full Text Request
Related items