Font Size: a A A

Research Of Short Text Summary Generation Based On Text Structure Information

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:R S WuFull Text:PDF
GTID:2428330578479398Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an effective means to alleviate the problem of information overload,automatic summarization has always been a research hotspot in the field of natural language processing.Since the existing neural network methods cannot effectively encode long texts semantically,the current mainstream abstractive summarization methods mainly focus on short texts,and the input text is encoded by the encoder based on the recurrent neural network.The learning information is mainly the serialization information reflected by the input text,which lacks the effective use of the structural information such as physical structure and semantic structure contained in the text.This thesis mainly studies how to use the structural information contained in the text to improve the accuracy of the generated summary,which includes three aspects:Firstly,we propose a method to integrate physical structure information of text.The physical hierarchical structure of text is helpfiul to judge the semantic information and importance of different structural units in the text more accurately.Therefore,we propose a hierarchical text reader to encode text according to the physical hierarchy of text,and propose a semantic fusion unit to fuse the semantic information of different levels of input text,forming the final text representation to the decoder.The experimental results show that the system performance has been significantly improved in ROUGE evaluation.Secondly,we propose a method to integrate semantic structure information of text.We take the named entity represented by BIO tags as the word-level structure information,and the dependency syntax structure as the sentence-level structure information,which forms the shallow semantic structure information to enrich the semantic features of the encoder.By expanding the traditional encoder-decoder model,the summary centered on the core entity is generated.The experimental results show that the system performance is improved by introducing shallow semantic structure information.Finally,we explore how to learn and utilize the implicit structure information of text adaptively,and propose an abstractive summarization model based on the implicit structure of text.The model introduces a text self-matching mechanism at the encoder stage.It aims to collect and incorporate implicit structure information from the whole text for each word based on their relevance to the current word,and extracts the core content of the text using the global gating unit.The experimental results show that the method has a significant improvement in ROUGE evaluation,which indicates that the proposed model can effectively learn and utilize the implicit structure information of the text.
Keywords/Search Tags:Automatic Text Summarization, Neural Network, Encoder-Decoder, Text Structure Information
PDF Full Text Request
Related items