Font Size: a A A

Research On Mutually Reinforced Manifold-ranking For Multi-document Summarization

Posted on:2019-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:W H YouFull Text:PDF
GTID:2428330569477275Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rise of the Internet,the amount of data in the network is increasing exponentially,and the pace of people's life is quickening.How to obtain the required information efficiently from massive data has become a problem to be solved urgently.Automatic summary technology can fuse and compress text information,and reduce the size of text while retaining the important content of the document.This technology is the key to solve this problem and overcome this obstacle.This paper takes online English news as the research object,with a view to provide users with concise and comprehensive summarization,improving the efficiency of users' information acquisition.In this paper,a multi-document automatic summarization method based on mutual reinforcement manifold-ranking is proposed.Through the mutual reinforcement among the word,the sentence and the theme cluster,the quality of the sentence extraction is improved and reduce the redundant information.The method is used in the query based multi-document automatic summarization model.The main contents and conclusions of this study are as follows:1.Recognition theme clusterAfter the data preprocessing,the word set,sentence set and cluster set are obtained,and the data objects are classified by a given query.The similarity of the objects of the same kind is higher,the similarity between the different classes is low,and the result of the clustering is achieved,and it will be compared with the given query.A set of higher degree of similarity is determined as a theme cluster.2.Ranking sentence and control redundancyThrough the correlation propagation among word sets,sentence sets and theme cluster sets,we build mutual extraction among word sets,sentence sets and theme cluster sets,and extract abstract models.Word set,sentence set and theme cluster set a weighted graph for each group of objects,where each data vertex represents queries,words,sentences and theme clusters.The word set,sentence set and the theme cluster are mutually reinforcing,and the two processes can be carried out in turn or in combination until the global stability is reached,and all the data objects get the ranking score.The data object is filtered,redundant information is extracted,and the high score data objects are extracted as summaries to generate summaries.3.Summary performance evaluationA comprehensive experimental study is carried out to verify the effectiveness of the two algorithms.Use the automatic test kit ROUGE to evaluate.ROUGE measures the quality of abstracts by calculating the overlap between system generated summaries and artificial reference summaries.In this paper,we use the method to analyze the ratio of accuracy and recall,and compare it with other summarization methods.On the TAC 2008 A,TAC 2008 B,TAC 2009 A,and TAC 2009 B data sets,the multi-document automatic summarization method based on mutual enhancement manifold sorting studied in this paper is equivalent to the ROUGE calculated by the system participating in the top three of the DUC/TAC competition.And through the deviation test,the deviation is low,and the results of the experimental analysis can be used as conclusions,further illustrating the necessity of integrating word-level and theme cluster information in the automatic summarization method.
Keywords/Search Tags:Mutual reinforcement of relevance, Clustering Algorithm, ROUGE, Automatic summary
PDF Full Text Request
Related items