Font Size: a A A

Research On Text Automatic Summarization Method

Posted on:2021-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:X X WangFull Text:PDF
GTID:2428330611497423Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic text summarization refers to the extraction and summary of important information that can accurately reflect the content of the original text or text set by using various methods of computer.The rapid growth of information has caused people to suffer from information overload.In the face of massive amounts of information,it is often impossible to obtain the required information quickly and accurately,and the automatic text summarization technology can effectively solve such problems.Through this technology,we can help people obtain high-quality information from the Internet quickly and efficiently.At present,the quality of summarizations generated by automatic text summarization technology is still lacking,so how to use automatic summarization technology to extract text summarizations efficiently is the main research content of this paper.This dissertation applies the Text Rank algorithm to automatic text summarization,and propose a text summarization algorithm called SW-Text Rank based on Text Rank.The LDA topic model is introduced into the process of extracting text summarizations,which solves the problem that the Text Rank algorithm cannot consider the topic of text.In this dissertation,the model Bi LSTM-CRF based on char and word is used to identify the named entity in Chinese text,and obtain effective information,thereby adjusting the weight of word nodes,so as to improve the accuracy of the generated text summarizations.The main work of this dissertation includes the following two parts:(1)In view of the fact that the Text Rank algorithm ignores the semantic related information between words and important global information in the automatic extraction of Chinese text summarizations,we propose SW-Text Rank algorithm.The similarity between sentences is calculated based on word vectors trained by Word2 Vec and the factors that affect sentence weight,such as position of sentences,similarity between sentences and title,coverage of keywords,key sentences and clue words,are taken into account to optimize the sentence weight.The candidate summary sentence group is redundantly processed,and the top-ranking sentences are selected and rearranged according to their order in the text to get the final summary,and finally verify through experiments.(2)We propose a text summarization method combining LDA topic model and Bi LSTM-CRF entity recognition model.This method introduces LDA model into the process of text summarization generation,and fully considers the topic distribution obtained by LDA model,so as to make the generated text summarization closer to the topic of text.Using the optimized Bi LSTM-CRF model to identify the named entity in the text,we can get the useful character information,location information and the organization information of the event in the text,based on which we can adjust the weight of the word nodes in the Text Rank graph of word.Then use the SW-Text Rank algorithm to generate the final text summary and verify through experiments.
Keywords/Search Tags:automatic summarization, sentence weight, TextRank, topic model, BiLSTM-CRF
PDF Full Text Request
Related items