Font Size: a A A

An Automatic Summarization Model Based On Deep Learning For Chinese

Posted on:2020-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2428330590495851Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The growing text data in the Internet has led to people spending more time selecting and screening the key information in the text.As a method of obtaining key information from long text,automatic summarization technique can effectively reduce the time cost of people in the era of information explosion,which attracts more and more researchers' attention.Automatic summarization techniques can be divided into extractive summary and abstractive summary according to the way of generation.Extractive summary technique only relys on the original text and has some limitations.Abstractive summary has high flexibility,but it is still difficult to maintain sufficient information for text initialization and obtain high quality data when generating Chinese summary.Through a series of natural language processing techniques,our proposed method can generate more concise and accurate Chinese summary.Firstly,according to the structural characteristics of Chinese,a stroke-based text vector coding is put forward to ensure that the text vector contains sufficient semantic information.Inspired by the fact that Fasttext model can capture prefix and suffix information in English,our paper proposes a stroke-based text vector coding method according to the structural characteristics of Chinese.We use the stroke granularity-based encoding method to construct a stroke dictionary and obtain the text vectors through Skip-Gram model.Our model can capture the task of more detailed representation of Chinese character component information.Secondly,we optimize the Seq2 Seq model which is mainly used in text generation.It mainly includes the use of Bi-LSTM in encoder which is to solve the problem of long-sequence text information loss and information supplement from back to front to a certain extent,the use of Attention mechanism to capture the correlation between the input and output words,and the use of Beam Search in the decoder of the test phase to optimize the sequence of the results.Based on the LCSTS data set training model,the experiment results show the effectiveness of our coding method and model in improving the readability of text summary through Rouge score and artificial judgment.
Keywords/Search Tags:Deep learning, Generation summarization, stroke_embedding, Seq2Seq, Attention mechanism
PDF Full Text Request
Related items