Font Size: a A A

Research On Text Summarization Generation Method Based On Whole Word Attention

Posted on:2022-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:X C WangFull Text:PDF
GTID:2518306353477264Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the current Internet,a lot of content exists in the form of text,and new text is constantly generated every day.Most of these texts are written by ordinary users,whose categories,emotions and other definitions are not clear.Actually,it is often necessary to analyze and summarize these data.For these data,people have designed some summary generation methods,all ideas can be divided into two directions,one is to extract a part of the original text,the other is to analyze the original text to generate an independent summary.Most of the traditional methods are based on the first scheme,which is simple to implement and easy to obtain a good effect.However,with the development of current algorithms and GPU computing power,the traditional extraction summary generation method mainly has some shortcomings,such as difficult to parallelize,poor fitting effect,slow speed and so on,which has been difficult to meet the analysis of modern multi complex data.Therefore,the generation based summary generation method has been paid attention to and developed.This scheme can generate any form of abstract.The generative text summarization method can produce the summary closest to the real human being and promote the automation of information processing.For the existing mainstream text summarization methods,machine learning or neural network are used to extract the interval from the original document,and the interval is the summary generated by the model,or the representation code is generated from the original document by using the sequence to sequence method,and the summary is generated word by word with this code.There are some problems in both methods.The summary of the first method depends on the original text,and it is difficult to control the number of generated intervals.For the fixed number of intervals,there may be too few intervals,insufficient information,or too many intervals,resulting in redundancy.The second method is relatively free,but compared with translation and other self-monitoring training tasks,the summary generation task depends more on the annotated data,and there is still much room for improvement in the generation effect of current mainstream models.In this regard,this thesis improves the attention mechanism and expert knowledge fusion method,and proposes a text summarization generation method FMBWWA based on whole word attention mechanism,including knowledge data fusion,multi task learning and other optimization schemes.Finally,the effect of FMBWWA is better than the existing open source model on the test set.For these optimization schemes,their focus is also different.The whole word attention mechanism is committed to enhancing the attention of the model to the text block.By pooling the attention weight of adjacent elements,this part of the word is a whole at the attention level,which can produce a one to many relationship,which makes the attention of the model more diversified,Learning different semantic subspaces.In multi task learning,it aims to make the model predict words as coherent as possible,which can not only force the model to learn more information,but also improve the language coherence of the generator.The sememe part is an attempt to integrate the external knowledge base information into the neural network,which regards human expert knowledge as the pruning auxiliary information of the original model,so as to avoid the imperfect problem of single expert information in essence.To sum up,the text summarization method based on whole word attention proposed in this thesis explores the field of text summarization generation at different levels and from different angles,finds the shortcomings of existing automatic text summarization methods,and makes some targeted improvements,which makes the final model make full use of corpus information and knowledge base information to obtain better text summarization It also provides new enlightenment for the later researchers.
Keywords/Search Tags:Text summary, FMBWWA, Whole word attention, Multi task learning, Sememe
PDF Full Text Request
Related items