| With the rapid growth of Internet data,it is crucial to obtain effective information in the massive data,and the emergence of automatic text summarization technology provides new ideas on how to obtain information quickly and accurately.In recent years,due to the rapid development of deep learning,research in the field of text summarization has achieved promising results,but text summarization technology still has problems such as generating content that deviates from the topic and uncontrollable content,etc.In this paper,we study the problems of text summarization technology,and the main contents are as follows:Firstly,since the Transformer model was proposed,there have been numerous summary generation models based on the Transformer model,but these models are not able to accurately reproduce the factual details and thematic information of the original text.To address the problems of unconstrained,uncontrollable and off-topic content generated by the summary generation model,this paper proposes a summary generation method KGIT model with the key information of the pointer as the guide information.The model uses Transformer as the skeleton,and the encoder uses BERT to pre-train the model and fuse the keyword information in the decoding process.The KGIT model is obtained using a comprehensive keyword extraction algorithm,using LSTM and Text Rank together to extract keywords,and use pointers to select keywords,use the extracted keywords as guidance information,and generate abstracts based on the guidance information.KGIT model can well correlate the source text and keyword information,and avoid the problem of generating abstracts unrelated to the topic.In this study,ROUGE is used as the summary evaluation criterion to compare the KGIT model with mainstream summary generation models on the NLPCC2017 Chinese news summary dataset,and the results show that the summaries generated by the KGIT model are closer to the standard summaries.Secondly,text summarization techniques tend to generate summaries in an autoregressive manner,which leads to the generation of summaries deviating from the standard summaries due to the exposure bias problem in the test phase.In addition,due to the inconsistent calculation methods of the objective function and summary evaluation criteria,there are cases where the final results are not the highest rated.To address these problems,this paper uses the comparison loss between multiple candidate abstracts to mitigate the exposure bias,and uses the similarity loss between high-scoring abstracts and the original text to improve the text comprehension by the encoder.The model consists of the encoder and decoder of the CPT pre-training model and the key information extraction and fusion module of the KGIT model.Firstly,the model is trained to become a generative model that can generate multiple candidate abstracts,secondly,the candidate abstracts are divided into positive and negative samples using the ROUGE evaluation criteria,and on the basis of retaining the cross-entropy loss,the contrast loss is introduced by comparing the positive and negative samples,the similarity loss is introduced according to the similarity between the high-scoring abstracts and the original text and the similarity between the original text and the reconstructed text,and the three losses are superimposed as the final loss to train the model again.The experimental results on the LCSTS dataset show that the model is able to provide a more detailed portrayal of the generated summaries. |