Font Size: a A A

Research And Implementation Of Multilingual Automatic Summarization System Based On Deep Learning

Posted on:2020-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:Z W YiFull Text:PDF
GTID:2428330572489366Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,international exchanges are becoming more frequent.People are surrounded by a large amount of information every day in their lives.How to efficiently select the information they need most is becoming more and more important.Automatic summarization is the key technology to solve the information explosion problem.Cross-language automatic summarization technology allows people to browse multi-national literature and help people learn about different countries and regions quickly in the world,which has important value for research and application.The multilingual automatic summarization system implemented in this dissertation mainly has two functions:monolingual automatic summarization and cross-language automatic summarization,which can process short texts of scientific literature in Chinese,English and Korean.Based on the RNNLM model,this dissertation proposes a monolingual automatic summarization method with pre-training word vectors,which generates summary of the same language for texts in a certain language.Based on the Seq2Seq model,this dissertation proposes a cross-language automatic summarization method that does not require machine translation,which can directly generate a summarization of another language for text in one language.Firstly,we collected abstracts and titles of scientific literatures,in order to build the Chinese-Korea-English parallel corpus.We observed the performance of the model on the test set using different loop body structures and different neural network structures.At the same time,based on the Attention mechanism added to Seq2Seq model,Word2Vec and RNNLM were used to pre-train the word vector to observe the performance of the model on the test set.Secondly,using the training scheme based on the Seq2Seq model through the parallel corpus of Chinese,Korean and English,the cross-language automatic summarization without machine translation technology was realized.Under the text in a language,the model could directly generate a summarization of the text in different language.Finally,we designed and implemented a multilingual automatic summarization system based on Django framework,introduced the overall design of the system and various functional modules,and showed the function of monolingual automatic summarization and cross-language automatic summarization.The experimental results show that in the monolingual automatic summary task,the RNNLM-based word vector pre-training scheme proposed in this dissertation is better than the Word2Vec-based word vector pre-training scheme,and the ROUGE-1,ROUGE-2 and ROUGE-L indicators on the test set are 32.57%,9.17%,and 25.70%respectively.In the cross-language automatic summarization task,the proposed cross-language automatic summarization method has good experimental results on the test set.In the six cross-language automatic summary experiments,the RUGOE-1 indicator averaged 23.30%,and the ROUGE-2 indicator averaged 4.93%,the ROUGE-L indicator averaged 19.47%.The multi-language automatic summarization system realized in this dissertation can meet the actual needs of science and technology researchers in Northeast Asia and improve the efficiency of reading literature.
Keywords/Search Tags:cross-language automatic summarization, abstractive summarization, deep learning, Seq2Seq model
PDF Full Text Request
Related items