Font Size: a A A

Research On Automatic Chinese Text Summarzation Based On Stack BiLSTM

Posted on:2020-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:X T ShiFull Text:PDF
GTID:2428330575989052Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people are getting more and more text data.The abstract is a good way for people to solve the problem of too much reading,but not enough time.Automated text summary is designed to replace manuals with computers,summarize long texts into concise summaries,reduce labor costs and increase the number of digests generated.Many existing text summarization techniques are still based on extraction,and the summary generated is simple and cannot fully express the meaning of the article.Advances in deep learning have given new direction to research automated text abstract.At present,the neural network model based on the Seq2Seq framework has become the basic framework for the research of generated text summaries.Based on this,this thesis constructs a number of generated text summary models,which are divided into single dictionary text summary model and multi-dictionary text summary model according to the number of dictionaries.The main research contents are as follows:1)In this thesis,a single dictionary model based on stacked copy mechanism and coverage mechanism is constructed based on stacked BiLSTM.The model uses stacked BiLSTM for information extraction to improve the model's ability to understand semantics.The model fuse copy mechanism and coverage mechanism,increase the consistency and readability of the text summary,and reduce the problems of out of vocabulary and word duplication.2)On the basis of the study of single dictionary automatic text summary,this thesis constructs a multi-dictionary automatic text summary model with coverage mechanism based on stacked BiLSTM to simplify the model structure and improve the efficiency of the model.3)This thesis adds ensemble learning to the experiment.Because different encoders understand different semantics,different encoders are used for model training in this thesis.It increases the diversity of semantics the model can understand,and improves the accuracy of experimental predictions and the generalization ability of experiments by integrating multiple models.This thesis uses ROUGE indicator,and carries on the experiment to analyze on the short-length Chinese summary data set LCSTS2.0 built by research center for intelligent computing in Harbin Institute of Technology.The experimental results show that compared with the single dictionary automatic text summary,the multi-dictionary automatic text summary model with stacked BiLSTM and the overlay mechanism can improve the ROUGE indicator by 10%,and the model integration based on this model can improve the ROUGE indicator by about 5%.
Keywords/Search Tags:Deep Learning, Automatic Summarization, Seq2Seq, Stack BiLSTM
PDF Full Text Request
Related items