Font Size: a A A

Research On Automatic Generation Method Of Chinese Text Summarization

Posted on:2021-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:X R LianFull Text:PDF
GTID:2428330620963394Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the explosion of artificial intelligence,the problem of information overload has seriously affected people's efficient and convenient lifestyle.Automatic text summarization technology mainly compresses long text into short content,which helps people find the information they need quickly.For Chinese automatic summarization,the current technologies fail to achieve the ideal results,and mainly focus on the research of extractive method.In addition,the results of abstractive summarization are not smooth and coherent,and the information is not comprehensive.Therefore,this paper proposes a hybrid model that combines extraction and abstraction.The model first uses the BERT-based extractive method to get the summarization,and then sends the sentence into the abstractive model to generate the summarization.The main research contents of this paper are as follows:(1)Summary sentence extraction based on BERTExtractive summarization mostly use shallow text features to score basic semantic units,and then determine the weight of sentences based on the scores.Because the context information of the text is ignored,the extracted summarization have poor coherence.In response to this problem,this paper uses a BERT-based extractive model to learn and develop deep semantic features.This model first uses vectors to represent each sentence of the document,the second step gets the score of each sentence and sorts from large to small,and finally extracts the sentence with a higher score as a summary.(2)Summarization generation method integrating core words attention mechanismAiming at the common problems of OOV and repetitive words in abstractive summarization,this paper improves the Seq2Seq+Attention model by using pointer network and an coverage mechanism.And on this basis,for the problem that the generated summary subject information is notcomprehensive,a summary generation method that integrates the core words attention mechanism is proposed.(3)Automatic summary generation based on hybrid modelBecause the Seq2 Seq model will produce data truncation for long texts,resulting in loss of information.So this paper proposes a hybrid model,which combines two methods of extraction and abstraction.First,the BERT-based extractive model is used to extract the important sentences in the article to construct an sentence set.Then,the summary sentence set is integrated into the abstractive method of adding the core words attention mechanism.Therefore,a hybrid method combining extraction and generation is realized.
Keywords/Search Tags:Automatic text summarization, BERT, Seq2Seq, Attention mechanism, Core words
PDF Full Text Request
Related items