Font Size: a A A

Research And Application Of Abstract Method Of Chinese Web Text Based On Seq2Seq Framework

Posted on:2024-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:L M MengFull Text:PDF
GTID:2568307118953429Subject:Electronic information
Abstract/Summary:PDF Full Text Request
With the application and popularization of internet technology,massive text data is generated.How to enable people to quickly retrieve key information from these complicated information,so as to improve the efficiency of reading retrieval becomes particularly important.According to the implementation principle,automatic text summary technology is mainly divided into two types: generation and extraction,which are used in different scenarios.At present,the generative summary model based on Seq2 Seq framework has become the main technical means of generative method by understanding the full text input and summarizing the main content of the full text with concise language.However,it is pointed out that the traditional Seq2 Seq model is weak in recalling the key information of the text,and has insufficient understanding of the contextual semantic information of the text,and the traditional model training method is unreasonable.This paper proposes solutions to the above problems and completes the following work based on the needs of practical application:(1)Abstract generation model of traditional Seq2 Seq framework is unable to fully learn the semantic information of the full text.An improved DGCNN network model based on Seq2 Seq framework using Robert pre-training is proposed.After fully understanding the semantic information of the full text through multiple attention mechanism and DGCNN network,The probability distribution of the words containing the key lexical information of the article is obtained,and the model decoding process is optimized by the sequence labeling copy mechanism,so as to improve the retrieval and recall of the key semantic information in the generated abstracts.In order to solve the problem of "exposure bias" caused by traditional teacher force training,this paper trains the model through the mechanisms of cross entropy loss function,contrast learning and reinforcement learning,so as to avoid the problem of inadequate model training and "exposure bias".The experimental comparison shows that the model proposed in this paper can effectively extract the semantic information of the full text,recall the key information of the full text,and effectively improve the quality of the generated abstracts.(2)In the landing scene of the abstract task,it has been a research pain point in the industry to understand the long text and generate the abstract.According to the actual needs,this paper proposes a hybrid abstract model method combining extraction and generation.The maximum encoding and decoding output lengths of the generative model proposed in this paper are adjusted to satisfy the model’s effective summary output of medium and long texts.The extraction Textrank algorithm optimized by Word2 vec word embedding and convergence process is used to reduce the length of the super-long text,and then the text input generation model with the key information of the original text is retained for accurate summary output.The effectiveness of the mixed summary model in this paper is verified by experimental comparison and case analysis of different length types of text.(3)An automatic text summary system is built,which is supported by the generative model proposed in this paper and the optimized Textrank algorithm.Different summary output strategies are adopted for different text lengths to complete the system design and implementation of the automatic text summary system.
Keywords/Search Tags:text summarization, Seq2Seq framework, Textrank algorithm, Hybrid summary model
PDF Full Text Request
Related items