Font Size: a A A

Research On Seq2Seq Model Optimization For Text Automatic Summarization

Posted on:2022-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:R XiaoFull Text:PDF
GTID:2531306488480944Subject:Engineering
Abstract/Summary:PDF Full Text Request
The amount of data in aviation safety reports is large and long.Extracting important information from a large amount of information has become an urgent problem to be solved.Automatic text summarization technology can effectively reduce the user’s information expansion problem by compressing and extracting the most important information.Both machine translation and text summarization belong to sequence conversion problems,but text summarization needs to capture the core information of the input text,and there is no obvious alignment relationship between input and output.Although the sequence-to-sequence model is outstanding in the field of machine translation,there are still some problems for text summarization,such as: not suitable for processing long texts,complex network structure,high time complexity,difficulty in sample labeling,it still needs to enter the practical stage.Improvement.In view of the above problems,this article summarizes and optimizes the research methods of predecessors.The main research work is as follows:1.The sequence-to-sequence model based on the encoder and decoder is the mainstream model with the most application of generative automatic summarization,but the traditional Seq2 Seq model will have problems such as repetition and off-topic.This article proposes two optimizations: one is global information encoding,which uses convolution and self-attention mechanisms to obtain global information of the original text and passes it to the decoder;the other is topic information decoding,which extracts important entities from the original text and encodes them as topics Vectors help the decoder to obtain information worthy of focus and improve the authenticity and reliability of text summaries.Experiments on LCSTS show that compared with the previous model,global coding and topic decoding have improved various Rouge indicators,and the improvement effect of the model combining the two is more obvious.2.Based on experience,when people summarize long texts,they usually find key sentences first,and then summarize the information in the full text in their own words.Inspired by this,an accurate and fast summary model is proposed.The model first selects prominent sentences and then rewrites them to generate a concise summary.While maintaining language fluency,a novel sentence-level gradient method is adopted to build a non-differentiable computing bridge between the two neural networks in a hierarchical manner.In order to explore the application of automatic text summarization in the aviation field,the Chinese aviation safety report data set was manually sorted in the experiment,and the model was trained to achieve the best summary index.In addition,by first running at the sentence level and then at the word level,the generation model can be decoded in parallel to increase the generation speed.
Keywords/Search Tags:Global encoding, Subject decoding, Reinforcement learning, Extractor, Generator
PDF Full Text Request
Related items