Font Size: a A A

Research On Short Text Automatic Summarization Method Based On Deep Learning

Posted on:2020-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:C X DongFull Text:PDF
GTID:2428330572971145Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the emergence of new media platforms,the information that people encounter every day has exploded,which has brought people with information overload.With the accelerated pace of life,people have no time to sort out all the information they receive.By reading the abstract,people can improve the efficiency of understanding the original text and reduce the time and effort of browsing information effectively.With the rise of deep learning,more and more researchers use deep learning methods to generate abstracts of documents and gradually apply them to practical systems.Therefore,this paper studies the semantic representation of short text and the automatic summary method with sequence-to-sequence architecture based on deep learning,and applies the semantic representation of short text to the automatic summary task.Currently unsupervised text representation methods mainly include vector space models and doc2vec.These methods can achieve good results while the corpus is large,but ignores the word order information in the text.Aiming at this problem,this paper proposes an unsupervised model named RevONet,considers the word order feature,uses the convolutional neural network to learn the semantic representation of the document,and compares it with the word frequency,word frequency inverse document frequency,LDA,LSI,doc2vec in the text classification task.The experimental results show that RevONet model can achieve the accuracy of 78.7%,which is better than the vector space model and doc2vec.It verifies the validity of the RevONet model in semantic representation and applies it to the automatic summary task to measure the semantic similarity between the source text and the target summary.According to the characteristics of the text abstract,this paper proposes a maximizing document similarity model named DocSNet based on the sequence-to-sequence architecture.The DocSNet model uses the source text semantic representatlion extracted by the RevONet model to calculate the similarity between the source text and the target abstract,and further generates the abstract by maximizing the semantic similarity optimization model between the source text and the target abstract.For a sequence-to-sequence architecture,the DocSNet model uses a bidirectional LSTM as the encoder,and a unidirectional LSTM builds the model for the decoder.Not only that,the DocSNet model introduces an attention mechanism to further improve the quality of the generated summary.Through the experiments on the large-scale Chinese short text summary dataset released by Harbin Institute of Technology,the ROUGE-1 and ROUGE-L indicators of the DocSNet model can reach 33.6%and 30.4%,respectively,which verifies the validity of the DocSNet model.
Keywords/Search Tags:Automatic Summary, Seq2Seq, Attention Mechanism, Unsupervised Learning, Semantic Representation
PDF Full Text Request
Related items