| With the continuous development of the Internet,text information has exploded.The problem of information overload has seriously affected people’s access to effective information.Automatic summarization technology can rapidly extract valuable information,but there are still some problems in the generative method in automatic summarization.For example,in the generation process,due to inaccurate word segmentation and inaccurate feature information extraction,the generated summary is not smooth,insufficient accuracy,and incomplete information.In response to these problems,this thesis studies a new abstract generation method,the main work is as follows:1.Aiming at the problem of inaccurate final results due to inaccurate word segmentation,this thesis uses character-level word embedding methods to finally generate the BERT word vector as the input of the subsequent model.The character-level word embedding method is used in the input part of the BERT model.Based on the characteristics of a single Chinese character can be formed into a word,a single Chinese character is used as the input,avoiding the word segmentation step,and generating high-quality character vectors.The experimental results show that when the output result has a high accuracy rate,compared with the general input BERT model,the training efficiency has been greatly improved,and the training time has been shortened.2.Aiming at the problem of inaccurate results due to inaccurate feature extraction in the process of generating abstracts,this thesis uses the above-mentioned BERT character vector combined with the improved LeakGAN model to generate results.The thesis uses the LeakGAN model of hierarchical reinforcement learning strategy.By adding the attention mechanism to the discriminator part of the model,the key feature information is extracted,the global and local associations are captured,and the BERT character vector is combined to obtain high quality feature information to generate a more accurate summary.The characteristic information of the generated more accurate summary.This thesis uses the large-scale Chinese short text summary data set LCSTS as the data set for the simulation experiment.Through the automatic scoring of ROUGE and manual evaluation methods,compared with the extractive text generation model and the generative benchmark model Seq2Seq,the abstract generation method studied in this thesis has a certain degree of improvement in the accuracy and fluency of abstract generation. |