Font Size: a A A

Research On Methods Of Improving Semantic Coherence Of Text Summarization

Posted on:2022-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:L J WeiFull Text:PDF
GTID:2518306563980059Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Automatic text summarization is an important research direction in the field of natural language processing.It can summarize the main content and get brief representation of the source text by extracting,refining and summarizing.Because of the complexity and diversity of text sequence faced by the task of text summarization,especially in Chinese,abstracts generated by automatic text summarization methods are quite different from reference summaries.Meanwhile,the widely used evaluation metrics only evaluate the quality of generated text according to the superficial lexical overlap rate,and the scoring can not effectively reflect the quality of text summary results.In view of these problems,this paper studies how to improve the semantic coherence of Chinese text summarization and how to objectively reflect the quality of text summarization results.This work is supported by the National Key Research and Development Program of China under Grant 2018YFC0831300 and the main research work is as follows:(1)To solve the problem that the generated abstract cannot accurately summarize the core idea of the source text,a text summarization method based on word importance was proposed,which applies graph structure to the judgment of word importance.According to the attention distribution of encoder's last layer,a directed graph is constructed,and then the importance score of each word in source text is calculated from the graph,which is used to correct the attention to source text during decoding.The proposed method can ensure that every decoding time step pays attention to important words in source text,so that ensuring the semantic coherence and avoiding the inconsistency between generated summaries and source text.(2)In order to solve the problem of unknown words in current generated summarization system,a replication mechanism combining the importance of words is introduced into the traditional generative network,so that the system can generate summaries by the combination of extracted and generated way,and the influence of unknown words on readability of summaries is solved.At the same time,to solve the problem of repeated words in generated summaries,an attention distribution penalty mechanism is proposed to avoid paying attention to the same location of source text more than once during decoding.In addition,the semantic similarity between source text and reference summary is used to supervise the whole network during training stage,which can effectively improve the quality of the generated summaries.(3)In view of the limitations of current methods for evaluating generated text,semantic representation is applied to text evaluation,and a summary evaluation method based on semantic similarity is proposed.After word representation is obtained according to power-mean and concat,semantic representation vector of each phrase is obtained by using word importance as weight.Then semantic similarity between texts is obtained by cosine similarity.Finally,the greedy algorithm is used to calculate the similarity score between the texts.Compared with other text evaluation methods based on lexical overlap rate,this method can evaluate the quality of generated text from semantic level,and has higher reliability.
Keywords/Search Tags:Chinese Text Summarization, Generated Text Summarization, Attention Mechanism, Semantic Similarity, Text Evaluation Metrics
PDF Full Text Request
Related items