Font Size: a A A

Research On Graph-based Key Phrase Extractio

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:H YinFull Text:PDF
GTID:2568306905451394Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,text information is growing exponentially.People will spend a lot of time and energy in reading and text information processing.Keyphrase technology is a method that can obtain important information from documents quickly.It can alleviate the problems brought by information explosion to some extent,and is widely used in Information Retrieval,Text Classification,Question Answering System and other tasks.Keyphrase technology can be divided into keyphrase extraction technology(or method)and keyphrase generation technology(or method)according to the composition method of phrases.By evaluating the importance scores of phrases in the original text,the extractive method collects important words from the original text to form phrases,while the generative method uses the current Natural Language Processing technology to generate a series of phrases from the original text.Compared with the extractive method,the generative method has some problems,such as unknown words,redundant phrases and difficult training.In recent years,the research of Deep Learning has greatly promoted the development of generative key phrases,but the aforementioned problems have not been effectively solved.Therefore,the extractive method still takes up dominate position in the practical application.At present,the mainstream extractive method is mainly based on the word graph.This method takes the word graph as the research object,calculates the importance score of each word in the word graph,and finally determines the key phrase according to the importance score of the word.This paper studies the extractive key phrases from the following three aspects:(1)The EntropyRank algorithm is proposed based on the TextRank algorithm.It uses the LDA topic model to learn the topic distribution of each word in a specific document,and takes the entropy of the distribution as the importance value of each word in the Random Walk process.(2)The VSRank algorithm is proposed based on the TextRank algorithm.It calculates the correlation between words at word level and sentence level,and reconstructs the edge weight of two words in word graph by fusing the correlation of different levels.(3)The T-DGI-KE algorithm is proposed based on the Deep Graph Infomax(DGI for short)algorithm.It takes advantage of the dominance of the title in the whole document,and makes use of Recurrent Neural Network and Attention mechanism to enrich the theme representation of the document.At last,this paper introduces experiments on abstracts,news and paper datasets,and makes quantitative and qualitative analysis on the experimental results to verify the effectiveness of the method which is proposed in this paper.
Keywords/Search Tags:keyphrase extraction, word network, word-score calculation, graph convolution
PDF Full Text Request
Related items