Font Size: a A A

Research And Application Of Text Summarization Technology

Posted on:2022-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:L C XiaoFull Text:PDF
GTID:2518306524990369Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In modern society,with the rapid development of Internet technology and the advent of the information age,the amount of global information has skyrocketed,and the speed of manual processing of information is far from meeting people's needs.Automatic text summarization technology can help people quickly extract important information from the text,so as to improve the efficiency of information acquisition.Automatic text summarization technology is mainly divided into extraction and abstraction,and the two types of methods have different usage scenarios.In order to meet the different application requirements,this thesis studies the extraction Text Rank algorithm and the abstractive PreSumm model,and puts forward the improvement method for the existing problems,which improves the quality of text summary.The main work of this thesis is as follows:(1)An improved algorithm FB-Text Rank based on Text Rank is proposed.Text Rank has the problems of incomplete feature consideration and rough similarity calculation,which leads to the poor quality of the summary.Aiming at the problem of incomplete consideration of features,this thesis proposes the definition and calculation method of sentence position and keyword features,which improves the sentence features.For the rough problem of similarity calculation,according to the experimental results of text representation under different models and granularity,this thesis selects Bert for sentence vector representation to make the similarity calculation more accurate.Combined with the above two improvements,FB-Text Rank algorithm is proposed.The experimental results on the CNN/Daily Mail dataset show that the FB-Text Rank algorithm has improved Rouge index compared to other algorithms,which verifies the effectiveness of the algorithm improvement.(2)An improved model BT-Summ based on PreSumm is proposed.The lack of semantic information and slow decoding of PreSumm result in poor quality of summarization and low computational efficiency.To solve the problem of semantic information missing,this thesis proposes the definition of text position layered coding and coding calculation rules,which can completely retain the semantic information of the original text.To solve the problem of slow decoding,this thesis proposes two-step pruning to improve the Beam Search algorithm,which improves the computing speed of Beam Search in the decoding process.Combined with the above two improvements,BTSumm model is proposed.The experimental results on CNN/Daily Mail data set show that BT-Summ model has improved Rouge index compared with other models,which verifies the effectiveness of the model improvement.(3)A text summarization system is designed and implemented.According to the specific requirements of network data intelligent processing project and scientific and technological data intelligent analysis project,based on the FB-Text Rank algorithm and BT-Summ model proposed in this thesis,the requirements analysis,overall design,detailed design,system implementation,system testing and system application of the text summary system are completed.
Keywords/Search Tags:text summary, extractive, FB-TextRank algorithm, abstractive, BT-Summ model
PDF Full Text Request
Related items