The Research On Abstractive Text Summarization Method Based On Reinforcement Learning And Sentence Level Evaluation

Posted on:2022-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:X R Zhang

Full Text:PDF

GTID:2518306731487874

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The rise of automatic text summarization makes more and more researchers devote themselves to this work.Among them,data sets play a pivotal role in the text summarization task,but the production of a high-quality data set needs a lot of time and cost,and the research in this field is not still mature.Therefore,there is an urgent need for an accurate and efficient algorithm to quickly produce a data set in a certain domain.In addition,for the exposure bias in the seq2 seq + attention model,domain scholars try to adopt different methods to solve the problem,but the existing methods are at the cost of increasing the training cost or reducing the readability of the summary,which are not well compatible.Therefore,this paper takes the production of data sets and the exposure bias problem of the seq2 seq + attention model as two research points to explore solutions.The main work and innovation points of this paper are as follows:(1)To solve the problem of the lack of data sets,the long document academic paper data set IEEEETNN?base and short document academic paper data set IEEEETNN?sum were made.At present,in the field of automatic text summarization,the published data sets mostly belong to the news category,where the source documents are generally short,and each source document needs to produce a reference summary for supervision training and evaluation tests.The long length of academic papers and their own abstract make up for the above problems,and they are excellent objects for the production of data sets.Based on the above considerations,this paper designs a whole set of methods for the production of academic paper data sets from data collection,parsing to post-processing,including PDF paper crawler algorithm and three PDF paper parsing algorithms.It has been proved that the accuracy of the parsing algorithm designed in this paper is as high as 87.89% without considering the limitations of non-academic papers and the parsing tool of PDFTron.To our knowledge,this is the first effort to make a dataset by parsing complex PDF papers,and the code is portable.(2)To solve the problem of exposure bias in seq2 seq + attention model,this paper proposes an abstractive text summarization method based on reinforcement learning and sentence-level evaluation.The whole process is based on the strategy gradient algorithm in reinforcement learning,combining the two methods of sentence-level loss and sentence-level reward designed by us,and using the Rouge evaluation metrics to calculate the loss function value of the model prediction sequence in terms of sentences.Although the proposed method takes into account both the syntactic structure of the sentence and the model prediction output,the training cost of the whole process does not increase.The model was evaluated on CNN/DM common datasets and two academic paper datasets produced in this paper.Experiments show that,compared with the baseline model,our method achieves good performance on Rouge metrics and the model has strong generalization ability.Specifically,in Rouge-1 and Rouge-L metrics,the generalization performance of the sentencelevel reward method on IEEEETNN?base is 2.39% and 7.69% higher than that of the Bertsumextabs model,respectively.The generalization performance on IEEEETN?sum of short document datasets is 3.12% and 8.19% higher than that of the Bertsumextabs model,respectively.

Keywords/Search Tags:

Automatic text summarization, Data set, Seq2seq+attention, Exposure bias, Reinforcement learning, Sentence level evaluation, Rouge evaluation metrics

PDF Full Text Request

Related items

1	Research And Application Of Automatic Text Summarization Technology Based On Deep Learning
2	Research On Chinese Automatic Summarization And Its Evaluation Method
3	Research On Methods Of Improving Semantic Coherence Of Text Summarization
4	Evaluation Method Research Of Automatic Summarization Calculating The Similarity Of Text Based On HowNet
5	Research On Automatic Generation Method Of Chinese Text Summarization
6	Research On Automatic Summarization Of Microblog Events
7	Research On Automatically Generated Text Summary Evaluation Based On Deep Learning
8	Research And Implementation Of Automatic Text Summarization Based On Seq2Seq Model
9	Research On Abstract Generation Technology For News Text
10	Research On Chinese Text Summarization Algorithm Based On Deep Learning