Font Size: a A A

Research On Chinese Text Summarization Method Based On Improved TextRank

Posted on:2022-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:C Q ZhuFull Text:PDF
GTID:2518306476490844Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of the digital information age,the phenomenon of information overload is becoming more and more serious.How to quickly obtain key information from massive text is particularly important.Therefore,automatic text summarization algorithm has become a hot research direction.In order to present high-quality abstracts,this thesis studies the extraction algorithm of Chinese text abstracts.Aiming at the task of Chinese text summarization,this thesis optimizes it based on TextRank algorithm.(1)Improved TextRank algorithm for Chinese text direct Abstract extraction.First of all,this thesis started from various factors that can affected the quality of summary generation.When TextRank network graph of text was constructed,the nodes has been changed from sentences to sentence vectors generated by the best pre training model.Semantic information has been introduced to optimize the similarity between sentences,and the candidate sentence groups have been obtained.The idea of Maximal Marginal Relevance algorithm has been used to screen the candidate sentence groups.Experiments show that the method is more effective than the original TextRank algorithm.(2)Improved TextRank algorithm for keyword extraction,according to the keyword distribution for abstract extraction.The more the number of non repetitive keywords,the better the meaning of the text.According to the advantages and disadvantages of various existing keyword extraction algorithms and the uniqueness of single text,this thesis selected TF-IDF algorithm and TextRank algorithm to improved and guided the generation of keyword set.Each keyword in the generated keyword set has a different weight score,and the sentences have been sorted according to the keywords contained in the sentences.The experimental results show that the method performs well in single sentence abstract extraction,and the Rouge-1 representing the degree of key information is increased from 25.3% to 31.2%.(3)With the help of ensemble learning,TextRank algorithm is optimized to extract abstracts.This thesis corrects the extraction results of TextRank algorithm by several mainstream abstract extraction algorithms to ensure that the model has stronger generalization ability and the more important information can be carried from the extracted sentences.Through the data validation of China news network,the improved model in Rouge-1,Rouge-2 and Rouge-L index is increased from 34.4%,46.6% and 21.4% to 40.2%,55.4% and 29.6%,which proves that the method is superior to the original algorithm in abstract extraction and has stronger generalization ability.
Keywords/Search Tags:Abstract extraction, Keywords extraction, TextRank, Maximal Marginal Relevance, Ensemble learning
PDF Full Text Request
Related items