Font Size: a A A

Joint Scoring Automatic Text Summarization Generation Based On TextRank Algorithm

Posted on:2022-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZhuFull Text:PDF
GTID:2518306323984679Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
People are getting drowned in text information with the rapid development of internet technology.The data scale and the text information have increase rapidly.The large amount of text has far exceeded the limit that humans can process.How to efficiently extract important and useful information from numerous information to reduce reading pressure is a problem that needs to be solved in the era of big data.Automatic text summarization technology can transform complex articles into abstracts,which will help readers to understand the main content and framework of the article and reduce reading pressure.Therefore,readers can accurately grasp the reading direction and find the content they need.For the above reasons,automatic text summarization technology has become one of the research hotspots of natural language processing.At the same time,automatic text summarization technology has flourished in many other fields,such as search engines,generating report summaries,article compression,and so on.Our data analyzes the Text Rank algorithm on the news data set published by Sogou and improves the accuracy and recall rate of automatic text abstract extraction by improving the algorithm.We proposed corresponding improvement methods for this three problems:ignoring sentence position information,sentence semantic information and unsatisfactory edge weight calculation.The work of this paper mainly includes the following aspects:(1)Text preprocessing of news data,including removal of irrelevant text that affects text summarization,text segmentation,removal of stop words,and text representation.(2)Using TF-IDF to represent sentences in text,and calculating the similarity between sentences based on sentence vectors.(3)Using the word vector model skip-gram to train the external document word vector,and obtaining the internal document word vector according to the external document word vector,and then calculate the similarity between sentences in the internal document.(4)Combing the Text Rank algorithm and the TF-IDF algorithm,adding the position information of the sentence,and modifying the weight formula.(5)Constructing a new sentence similarity matrix based on the sentence similarity calculated by TF-IDF and Word2 Vec.The innovations of this article are:(1)In the fusion of TF-IDF algorithm and Text Rank algorithm,the position information of the sentence is added,which increases the score of important sentences to a certain extent.(2)By calculating the sentence similarity of TF-IDF and Word2 Vec,a new sentence similarity matrix is constructed.Word frequency and semantics are integrated,and the edge weight calculation is improved.
Keywords/Search Tags:Automatic Text Summarization, Text Rank, TF-IDF, Word2Vec, Sentence Similarity
PDF Full Text Request
Related items