| Nowadays,various business activities on the Internet generate a large amount of text data,and how to quickly obtain key information from these text data to improve people’s reading efficiency becomes very important.The automatic text summarization technology based on seq2 seq model has become a key technical tool to solve this problem by summarizing the main content of an article in concise and condensed language after a comprehensive understanding of the original text.However,the model has the problems of insufficient understanding of the original text and the generated summary lacks important information fragments of the original text.Therefore,this paper improves the quality of the generated summaries from the perspectives of mining the deep semantic features in the text and obtaining the important information fragments in the original text.Firstly,the original text and reference abstracts extracted from the public dataset LCSTS are subjected to data cleaning,word segmentation and other preprocessing work,and the data is converted into a data type suitable for the model,and the length of the preprocessed original text and reference abstract is statistically analyzed to prepare for the setting of relevant parameters in the subsequent modeling process.Secondly,in order to obtain deep text semantic features,the pre-trained language model Bert is introduced as the word embedding layer.The traditional word embedding model of the Word2 vec can only learn local contextual information and cannot distinguish the different meanings of the same word in different contexts,which leads to the limitation that the obtained word vectors cannot contain a variety of text features.However,the Bert pre-trained language model can solve the problem of polysemy by dynamically capture the contextual information of the text and make the generated word vectors change with the context,and then obtain a richer vectorized representation of the text.The experimental results show that the automatic text summarization model incorporating Bert word embedding can improve the quality of generated summaries,and improve the evaluation indexes of Rouge-1,Rouge-2,and Rouge-L by 4.45%,2.6%,and 3.94%,respectively,compared with the baseline model using Word2 vec as the word embedding layer.Finally,the automatic text summarization model incorporating Bert word embedding is further improved and optimized from the perspective of obtaining important information fragments in the original text,and the automatic text summarization based on sequence copying is proposed.The existing copy mechanism based on pointer-generator network is to copy words one by one,which leads to copy omission and causes the model cannot correctly complete the work of sequence copy.Therefore,this paper iteratively trains the model under the continuous constraint by imposing a continuous constraint on the copy of words,so that the model has the ability of sequence copying and then obtains the important information fragments in the original text.The experimental results show that the automatic text summarization based on sequence copy can improve the information coverage of the original text in the generated summaries and further improve the performance of the model,and improve the evaluation indexes of Rouge-1,Rouge-2 and Rouge-L by 1.8%,1.72% and 1.92%,respectively,compared with the automatic text summarization model incorporating Bert word embedding. |