| The main goal of the text summarization task is to simplify and summarize the main content of a piece of text,and then output a summarization that is much smaller than the length of the original text but can express the basic meaning of the original text.The BART(Bidirectional and Auto-Regressive Transformers)pre-training model is one of the most popular models in the field of text summarization.However,due to the limitation of the input text length,the input text usually needs to be cropped,which results in the loss of key information of the source text.At the same time,the BART model often Generate semantically incorrect vocabulary,resulting in the wrong content of the final summarization.In order to solve the above problems,this paper proposes two text summarization models based on Text Rank algorithm and pointer generation network:· Text summarization model based on improved Text Rank algorithm and BART model This model first adds the structural information and semantic information of the text to the original Text Rank algorithm to improve the algorithm,so that when the Text Rank algorithm sorts sentences in the text,the top-ranked sentences contain more key content in the sentence.Then use the improved Text Rank algorithm to preprocess the input text of the BART model,so that the key information of the whole text is included as much as possible within the effective input length of the BART model.· Text summarization model based on pointer-generator network and BART model On the basis of using the improved Text Rank algorithm to preprocess the input of the BART model,a pointer generation network model is introduced into the model.Before that,this paper modified the generation probability parameters of the pointer generation network model,adding the similarity between the vocabulary generated by the model and the original vocabulary pointed by the attention distribution.If the similarity of the original words is low,so that the probability of the model using the generated words is reduced,and the words of the original are copied instead.In this way,the situation that the BART model generates semantically wrong words and causes the abstract content to be inconsistent with the original text is reduced.Two text summarization models constructed in this paper were compared in a public dataset experiment,and the results showed that the model proposed in this paper is superior to the currently commonly used models in the three ROUGE indicators.At the same time,this paper also conducted specific case analysis,from which it can be seen that the summarization generated by the model proposed in this paper is more in line with the main idea of the original text. |