Font Size: a A A

Research On Text Summarization Algorithm Based On Deep Learning

Posted on:2022-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiFull Text:PDF
GTID:2518306554471464Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of information on the Internet,how to efficiently obtain information becomes particularly important.Automatic text summarization technology can process one or more documents and summarize short key information from it,which greatly alleviates the problem of user information overload.Nowadays,with the development of deep learning,text summarization technology has made great breakthroughs and progress,but there are also some problems.The main work of this paper is as follows:First,the existing Chinese long text summary data set is limited.This article uses crawler technology to obtain news data from the Sina News Center.After cleaning,filtering,and selecting operations,a part of it is selected for manual scoring and the matching degree is selected.For a better summary,a news summary data set with a total scale of more than 500,000 pieces was constructed,including 16,000 pieces of data that were manually scored.Second,in view of the long-term dependence of the generative summarization method in processing long texts,which leads to low summary accuracy and poor fluency,this paper proposes a three-segment text summarization algorithm based on BERT.In the pre-training stage,the characteristics of BERT two-way training are used to apply it to text summarization tasks to better obtain contextual word vectors.In the stage of extracting key sentences and generating abstracts,the extraction method and the generative method are combined,the pre-trained text vector is used as the input of the extraction model,and the key sentence is calculated by Fine-tuning BERT.In the summary generation stage,CNN and self-attention technology are added to the sequence-to-sequence model to extract global information and reduce the redundancy of the summary.The experimental results in ROUGE-1,ROUGE-2 and ROUGE-L reached 46.2%,32.2% and 41.6%,respectively,which proved the effectiveness of this method.Third,for multi-document summarization,the amount of input data in the model is too large,and the model is difficult to handle.This paper proposes a Transformer-based multi-document summarization algorithm,which sorts and filters multiple documents in paragraph units,and reasonably controls the length of the text.The attention mechanism is used to express the relationship between documents,realize the sharing of paragraph information,and apply the two-layer Transformer to the model to solve the problem of summary redundancy.Through experiments on the ranked version of the Wiki Sum data set,the experimental results in ROUGE-1,ROUGE-2,and ROUGEL reached 41.5%,26.5%,and 35.7%,respectively,which proves that the method has certain advantages in multi-document abstracts.The advantages.The above-mentioned work provides new research ideas for the generation of text abstracts.It has also been significantly improved in the various indicators of ROUGE,which is helpful to promote the research of text abstracts.It has a certain reference value and also has good practicability.
Keywords/Search Tags:Text summarization, BERT, Seq2Seq, Transformer, attention
PDF Full Text Request
Related items