Font Size: a A A

TextRank Keyword And Summarization Extraction Algorithm Based On Rough Data-deduction

Posted on:2022-08-02Degree:MasterType:Thesis
Country:ChinaCandidate:W Q ShiFull Text:PDF
GTID:2518306341986639Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet has resulted in an explosive growth of online text data.While the rapid development of the Internet brings convenience to users,it also makes it difficult for users to quickly,accurately,and comprehensively obtain the information they need from massive amounts of complex data.Text keyword and summarization extraction are two important research topics in the field of natural language processing,and their goals are to generate condensed content describing the theme of the text,thereby revealing the key information of the text.In order to mine the potential association between text data,this thesis researches the rough data-deduction theory on the basis of the keywords and summarization extraction algorithms,and focuses on the TextRank ranking algorithm based on the graph model.TextRank algorithm based on graph model is an effective keyword extraction algorithm,which can get high accuracy in extracting keywords,but it also has its shortcomings.When constructing the associated edges of a graph in this algorithm,the co-occurrence window rule only considers the association between local words and has greater randomness and uncertainty.To adress the issue,an improved TextRank keyword extraction algorithm based on rough datadeduction is proposed.Rough data-deduction can expand association scope,increase association data,and get more comprehensive results.Combined with the association rules in rough data-deduction,the algorithm proposed in this paper makes the following improvements.The candidate keywords are classified according to word meanings.The association between candidate words in different classifications is deduced by rough data-deduction.The experimental results show that compared with the traditional TextRank algorithm,the extraction precision of improved algorithm has been significantly improved,which proves that the idea of rough data-deduction can effectively improve the performance of this algorithm in extracting keywords.At the same time,to make the improved TextRank algorithm consider the influence of external knowledge on keyword extraction,this thesis proposes the TextRank keyword extraction algorithm of word vector clustering based on rough data-deduction.On the basis of mining the potential association between candidate words by rough data-deduction,this algorithm introduces word2 vec model,and uses it to train candidate word vectors for clustering.According to the clustering results,the nodes of the candidate keyword graph are weighted nonuniformly,so that the external knowledge of the single text is integrated into the algorithm and the extraction effect of the algorithm is improved.The experimental results show that compared with many existing improved algorithms,the extraction effect of this algorithm has a certain improvement.Considering the influence of text keywords on the results of summarization extraction,this thesis finds that the algorithm still has many shortcomings in summarization extraction through the research of Textrank automatic text summarization algorithm.The extraction results of the algorithm have poor relevance to the text topic,and most of the existing algorithms do not consider the influence of keywords on the text summarization,and the external features of the algorithm are relatively one-sided.To solve the above problems,this thesis proposes a weighted graph model automatic text summarization algorithm based on rough data-deduction.This algorithm gets the required keyword set based on the previous research on keywords,uses the LDA topic model to mine the topic information of the text,and then integrates the overall structure of the text and the context information of the candidate sentence to improve the algorithm.The experimental results show that compared with the classic algorithm,the improved summarization algorithm has better extraction effect.
Keywords/Search Tags:Rough Data-deduction, Keyword Extraction, Text Summarization, TextRank Algorithm
PDF Full Text Request
Related items