Research On Chinese Automatic Summarization Based On Deep Learning

Posted on:2022-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:X L Song

Full Text:PDF

GTID:2518306320954199

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Using computer to extract summary from Chinese text,which can help people get critical information quickly from massive amounts of data and improve reading efficiency.At present,Generative summarization technology based on sequenceto-sequence model is a research hotspot in Chinese information processing.We constructs a summary model,which is a sequence-to-sequence model that combines part of speech features and attention mechanisms,Meanwhile,the method of Chinese summary generation based on GPT pretraining model is proposed to improve the quality of summary generation.Experiments and analysis are performed on the automated summary task dataset which published by Natural Language Processing and Chinese Computing in2017,Using ROUGE to evaluate the quality of summary generation,The main work of this paper is as follows:(1)Chinese automatic summary model named BERT-Bi LSTM-LSTM,which combines part of speech features and attention mechanism,The seq2 seq model based on Recurrent Neural Network(RNN)has the following issues.Firstly,RNN has the dominant problem of short-distance gradient,which leads to poor memory of long-distance text.Secondly,The encoder outputs a context vector with a fixed dimension,which leads to the limited information of the original text stored in the intermediate semantic vector.Thirdly,The model fails to take into account the influence of part of speech factors on the generation of abstract vocabulary.For the above questions,In this paper,A seq2 seq model is constructed,which integrates part of speech features and attention mechanism,using BERT Chinese pre-trained model to build word embedding,the encoder uses a double-layer Bidirectional Long Short-Term Memory structure,and the decoder uses a unidirectional four-layer Long Short-Term Memory architecture.The result show that the model we proposed improves the quality of the generated summary.Using Rouge to evaluate summary quality,our model outperformed the baseline model by 3.38 on F1.(2)Method for generating Chinese summary based on Generative Pre-Trained Transformer is proposed.The Recurrent neural network is a chain structure network that prevents parallel computing.In this paper,a Transformer-based GPT pretraining model is introduced,the study found that the GPT pre-training model is universal.Model finetuning has low requirements on data quantity and calculation equipment.However,the model uses BPE coding,there are stop words and punctuation marks in the generated summary,which leads to unreasonable evaluation of the summarization.For the above reasons,this paper builds a GPT fine-tuning model based on part-of-speech features.First of all,Adjusting the coding granularity to the word level.Secondly,Integrating the part of speech features in the input,Finally,fine-tuning the model.Through comparative experiments,the model improved by 2.66 on F1 values compared to the baseline model.(3)The construction model is used as the support of the background model to build a Chinese automatic summarization system based on B/S architecture,providing user interface to experience Chinese summary generation for others,At the same time,user submission data is collected to optimize reserve dataset resources for future work.

Keywords/Search Tags:

automatic summarization, sequence to sequence, generative pre-trained transformer, part of speech feature

PDF Full Text Request

Related items

1	Research On Abstract Text Summarization Based On Sequence To Sequence Model
2	Research And Implementation Of Short Text Sentimental Abstractive Summarization
3	Research On Generative Automatic Summarization Based On Deep Learning
4	Research On Key Technologies Of Automatic Summarization Of Chinese News Documents
5	Research On Chinese Abstractive Text Summarization Based On Sequence To Sequence Model
6	Research On Abstract Text Summarization Based On Sequence-to-sequence
7	Research On Abstractive Text Summarization Model Based On Transformer
8	Research On Automatic Text Summarization Generation Technology Based On Deep Learning
9	Research On Key Techniques Of Two Phase Automatic Summarization Algorithm For Long Text
10	Research On Chinese Abstractive Summarization Via Fusing Topic Information