Font Size: a A A

Research And Implementation Of Multi-document Automatic Summarization Technology Based On Topic Model

Posted on:2020-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y L WuFull Text:PDF
GTID:2428330623959911Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,Internet applications have penetrated into all aspects of people's work and life.The explosive growth of information makes it difficult for people to get the information they need quickly and accurately.How to analyze key data and capture key content from massive data becomes a challenging task.Multi-document automatic summarization is one way to solve this problem.Automatic summarization technology can generate generalized texts,and condense related content into concise and low-redundant summarization.In recent years,Many new hybrid methods have been proposed,and automatic summarization technology has been constantly improved.The topic model methods commonly used in automatic summarization are analyzed.Improved the topic model for the deficiencies of existing methods,and the summary sentence sorting method is improved.Based on this,a multi-document automatic summarization method based on topic model is proposed.The research work includes the following points:(1)The topic model is improved for the inaccurate weakness of the topic model classification.It is assumed that the topic model is a mixture of document-specific model and background model,and the probability of document-word,topic-word,document-topic is re-estimated.This approach makes some words and topics that can be interpreted by the background model no longer appear in the document-specific topic model,so the topic classification is more accurate and the consistency of the words within the theme is stronger.(2)Aiming at the sorting problem of abstract sentences,the sequence relationship between two abstract sentences is determined by comparing sentence similarity.A directed graph with summary sentences as vertices and successional relations among sentences as edges is constructed.The order of the summary sentences is obtained by ordering the directed graph.(3)A multi-document automatic summarization system based on topic model is implemented,which combines the topic model and multiple statistical features,and can respond to the query.The redundancy is reduced by the MMR method,and the summary is generated after sorting.Experiments are carried out on the topic model re-evaluation method and multi-document automatic summarization method proposed in this paper.The experimental results show that the topic model re-estimation method improves the purity of the topic,and the classification effect is better.The automatic summarization method can obtain a comprehensive and well-organized summary.
Keywords/Search Tags:Automatic summarization, Topic model, Topic model re-estimation, Graph-based sorting method, Multi-feature fusion
PDF Full Text Request
Related items