Font Size: a A A

Research On Short Text Automatic Summary Model For Web Reviews

Posted on:2021-07-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HeFull Text:PDF
GTID:2518306104495454Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Automatic text summarization technology refers to extracting a short summary by a model algorithm.The short text summary generation technology is usually used for the extraction of opinions from online reviews and the compression of short texts.At present,the mainstream short text summary method is a generative text summary method.However,the generative text summary has problems such as high task complexity,strong data dependence,repeated generation,and unsmooth generation results,and it is difficult to meet the requirements of practical use.The deletion-based sentence compression model uses the method of keyword extraction to compress sentences.It is difficult to complete the extraction and abstraction of multiple sentences and the compression rate is difficult to achieve.In order to better meet the short text summary task requirements in real production,a short text summary generation model combining keyword extraction and sentence discrimination is proposed.Based on the traditional delete sentence compression model,a keyword extraction model is designed to extract keywords,and the keywords are reordered by a rule recombination algorithm to generate multiple candidate summary sentences.Finally,the designed discrimination model is used to score the candidate sentences and output a score The highest statement.A pre-trained language model BERT is introduced into the design of the network to improve network performance.The addition of pre-trained language models greatly reduces the model's dependence on sample size and improves the model's extraction of deep semantics.The addition of the discriminative model guarantees the readability of the final model and the complete grammatical process.Finally,a small-scale Chinese abstract data set was constructed through manual annotation based on real user reviews in the game forum.Based on the data set,experiments were designed to evaluate the model through the ROUGE index.In the end,the ROUGE-1 model score of 0.66 and ROUGE-L score of 0.53 were higher than the reproduced generative model pointer-generator model and deleted LSTM model.Experiments show that the improvements proposed in this paper effectively improve the quality of generated abstracts.
Keywords/Search Tags:Short text, Auto-summarization, Language model, Keyword extraction
PDF Full Text Request
Related items