Font Size: a A A

Research On Content Semantic Analysis Based Text Summarization Methods

Posted on:2023-07-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:1528306839479504Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic text summarization is one of the most critical and challenging tasks in the field of natural language processing.It aims to automatically produce a smooth,fluent,and concise summary from a text or text collection.Application scenarios of text summarization include News headline generation,automatic report generation,meeting summarization,etc.With the rise of deep learning,related researches of text summarization obtain breakthrough progress.Although the existing methods can extract or generate usable summaries,they face below challenges: 1)Corpora for Chinese summarization are few,and the large-scale Chinese long text corpora are even more scarce;2)Sequential summarization models lack the analysis of document structure,which is easy to cause the loss of long-distance dependency information;3)The sequential generation models lack the semantic analysis of the context topic,which leads to the generated summaries deviate from the topic;4)Most of the text summarization or generation methods lack the semantic analysis of the details of facts.To overcome the above challenges,this dissertation aims to study automatic text summarization methods bases on content semantic analysis.The main research content includes:To alleviate the problem of insufficient Chinese summarization dataset,an automatic construction method of large-scale Chinese long-text summarization corpus is proposed.First,a weibo and web crawler are employed to grab <News article,weibo> pairs that are naturally annotated by popular media users.Then,these pairs are filtered by several constraints and the average summary metric.Finally,to prove the availability of the constructed CLES dataset,several published methods are employed for experiments.Experimental results not only show that the corpus constructed by the proposed method is more challenging but also provide reliable benchmark results for further Chinese summarization research.To remedy the problem that the sequential methods lack the analysis of document structure,a hierarchical graph reasoning extractive summarization method is proposed.In this method,the sentence vector is represented by nodes,and the word nodes of are added to assist in the construction of the connection between sentences.The semantic derivation of each sentence is carried out through the connection between graph nodes.It not only solves the problem of missing information caused by the incomplete representation of long text,but also shortens the information attenuation caused by location distance.Then,the relational graph network is used to quickly model the sentences.Thus,the differential representation of the sentences can be obtained.Finally,the key sentences are selected through the sentence selector.Experimental results on the extractive summarization task show that the hierarchical graph reasoning method can effectively encode long texts and improve the semantic interaction between sentences.It is superior to the published conventional neural network methods and pre-training methods in various evaluation metrics,which further proves the effectiveness of the method proposed in this dissertation.To alleviate the problem that existing summarization methods lack the semantic analysis of the context topic,a topic-sentence guided abstractive summarization method is proposed.In this method,we employ a trained extractive summarization model to extract sentences of the document as the topic.Then,an additional topic encoder is proposed to represent the topic.And to generate a more relevant summary of the topic,a novel weighted attention mechanism is proposed.Experimental results on the CNN/Daily Mail dataset show that the proposed method achieves significant improvements compared with the published methods.We can also find that the proposed method is superior to generating summaries that are close to the topic of the article.Aiming at the problem that the conventional summarization or text generation methods lack the semantic analysis of factual details,this dissertation proposes a text generation method based on dynamic context content planning.On this basis,a generation record reconstruction mechanism is proposed to re-optimize the generated text.Firstly,the method can dynamically select the source data according to the generated historical text.Secondly,the proposed reconstruction mechanism can motivate the decoder to obtain more accurate information from the encoder.Experimental results on the ROTOWIRE and NBAZHN datasets show that the proposed method is significantly better than the published methods.The results of manual evaluation on the best-worst scale show that the text generated by the dynamic selection mechanism is of higher quality than the static selection method.In addition,the case study shows that the reconstruction mechanism is conducive to improving the fidelity of the generated text.In summary,this dissertation focuses on content semantic analysis based text summarization methods.For the existing four challenges in related tasks,this dissertation proposes an automatic construction method of large-scale Chinese long-text summarization corpus,a hierarchical graph reasoning model for extractive summarization,a topic-sentences guided abstractive summarization method,and a dynamic context planning algorithm for data-to-text generation,respectively.The methods are validated on their corresponding datasets and achieved new state-of-the-art performances.
Keywords/Search Tags:Automatic text summarization, Construction of the large-scale Chinese long-text Dataset, Hierarchical graph reasoning model, Topic-aware, Dynamic content plan, Reconstruction of generated history
PDF Full Text Request
Related items