Font Size: a A A

Research On Key Techniques Of Two Phase Automatic Summarization Algorithm For Long Text

Posted on:2018-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2428330623450979Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the explosive growth of information on the Internet,it becomes more important to improve the efficiency of knowledge acquisition.Automatic text summarization techniques provide a good means for fast knowledge acquisition by compressing and refining information.The calculation of text similarity is the key step for the final effect of automatic text summarization task.It will greatly boost the accuracy of the summarization algorithm,and then improve the overall performance of the whole summarization system if the calculation of text similarity can be effectively improved.Aiming at the existing disadvantages of the calculation of text similarity from two aspects of literal and semantic calculation method,a new method is proposed for calculating the hybrid text similarity,in order to comprehensively measure the similarity between texts.In view of the existing automatic text summarization methods,when dealing with long text,exhibit poor accuracy,and fail to meet users' need for performance.In this paper,we propose a two-phase automatic summarization method for long text,namely,EA-LTS.Firstly,it employs a hybrid text similarity calculation method based on a graph model to extract key sentences.Then,it constructs a recurrent neural network encoderdecoder model with attention and pointer mechanisms to generate summaries.The development of evaluation methods and the progress of automatic text summarization technology are complementary,and the high-quality evaluation method is a more long-term development foundation for automatic text summarization technology.This paper makes a deep analysis on the system of the existing evaluation methods,and it is found that neither the external evaluation method or internal evaluation methods have considered the semantic similarity,therefore,this paper proposes a new evaluation method based on hybrid text similarity in order to make up for the lack of semantic similarity.In view of the biggest bottleneck of abstractive summarization technology development is the lack of high quality dataset.The experimental dataset for this paper is collected by a self designed topic crawler from real world Chinese data,and it contains about 0.5M articles and the corresponding titles.,through experiments on this real largescale long-text corpora,the effectiveness of EA-LTS is verified.The results were compared with several popular automatic summarization method in the ROUGE and HTS index,effect is improved obviously.Compared with the benchmark RNN method,25.8% were enhanced on the HTS index(word)and 20.1%(char)...
Keywords/Search Tags:Deep Learning, Automatic Text Summarization, Text Similarity Calculation, Recurrent Neural Network, Graph Model, Sequence to Sequence Model, Attention Mechanism, Pointer Mechanism
PDF Full Text Request
Related items