Font Size: a A A

The Application Of Improved Automatic Abstractions Based On Graph Model In Conference

Posted on:2021-10-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:2518306017998149Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
This article mainly studies the application of automatic abstraction technology in conferences.Combining the characteristics of conference data with simple structure,short sentences,and even repetition,choosing the improved graph model automtic summarization technology,we can find the vertex with the largest weight in the weighted graph to solve the automatic summarization problem?Graph model automatic abstraction technology-the core of the TextRank algorithm is to calculate the similarity of sentences,while the classic TextRank algorithm only starts from the sentence structure and does not consider the semantic information of the sentence.This paper aims to improve the calculation method of similarity.Due to the uncertainty of the conference theme,it is considered as a "multi-document" automatic digest.If there are multiple topics,you can do cluster analysis first,and then use the TextRank algorithm in each category.This article uses Word2Vec algorithm to generate 300-dimensional word vectors,and then generates sentence vectors based on the word vectors.This paper presents two representation methods of sentence vectors.Model one considers that each word is equivalent,that is,it thinks that their contribution to the sentence is the same;model two is the opposite,that word contributes to the sentence TF-IDF value.Before substituting into the graph model,this paper performed cluster analysis on the data.The sentence vector obtained in this paper is 300-dimensional.The data is reduced by t-SNE technology and a scatter plot is drawn.The image display data is more concentrated.In addition,the Hopkins statistic of the data in this article is 0.59.Combining the above two conclusions,we can think that the data in this article are developed around a theme.In the case of ensuring the consistency of other calculation processes,can obtain two results for different sentence vector representations,and compare with the results of the classical graph model.ROUGE is chosen as the result of the evaluation,which is obtained by comparison:(1)The Improved model one has the best effect and is more suitable for use in meeting data.(2)In the meeting data,the best summary ratio is 5%.(3)In the meeting data or the spoken-oriented data,every word has the same degree of contribution to the sentence,in other words,it is unnecessary to consider the TF-IDF value of the word when calculating sentence vector.
Keywords/Search Tags:Weighted Undirected Graph, Automatic Abstract, Similarity, TextRank Algorithm, TF-IDF Aalue
PDF Full Text Request
Related items