Research On Graph-based Key Phrase Extractio

Posted on:2021-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:H Yin

Full Text:PDF

GTID:2568306905451394

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology,text information is growing exponentially.People will spend a lot of time and energy in reading and text information processing.Keyphrase technology is a method that can obtain important information from documents quickly.It can alleviate the problems brought by information explosion to some extent,and is widely used in Information Retrieval,Text Classification,Question Answering System and other tasks.Keyphrase technology can be divided into keyphrase extraction technology(or method)and keyphrase generation technology(or method)according to the composition method of phrases.By evaluating the importance scores of phrases in the original text,the extractive method collects important words from the original text to form phrases,while the generative method uses the current Natural Language Processing technology to generate a series of phrases from the original text.Compared with the extractive method,the generative method has some problems,such as unknown words,redundant phrases and difficult training.In recent years,the research of Deep Learning has greatly promoted the development of generative key phrases,but the aforementioned problems have not been effectively solved.Therefore,the extractive method still takes up dominate position in the practical application.At present,the mainstream extractive method is mainly based on the word graph.This method takes the word graph as the research object,calculates the importance score of each word in the word graph,and finally determines the key phrase according to the importance score of the word.This paper studies the extractive key phrases from the following three aspects:(1)The EntropyRank algorithm is proposed based on the TextRank algorithm.It uses the LDA topic model to learn the topic distribution of each word in a specific document,and takes the entropy of the distribution as the importance value of each word in the Random Walk process.(2)The VSRank algorithm is proposed based on the TextRank algorithm.It calculates the correlation between words at word level and sentence level,and reconstructs the edge weight of two words in word graph by fusing the correlation of different levels.(3)The T-DGI-KE algorithm is proposed based on the Deep Graph Infomax(DGI for short)algorithm.It takes advantage of the dominance of the title in the whole document,and makes use of Recurrent Neural Network and Attention mechanism to enrich the theme representation of the document.At last,this paper introduces experiments on abstracts,news and paper datasets,and makes quantitative and qualitative analysis on the experimental results to verify the effectiveness of the method which is proposed in this paper.

Keywords/Search Tags:

keyphrase extraction, word network, word-score calculation, graph convolution

PDF Full Text Request

Related items

1	Research On Keyphrase Extraction Algorithm Based On Word Embeddings Learning
2	Research On Graph-based Keyphrase Extraction Integrating Multiple Attributes
3	Research On Graph-based Keyphrase Extraction Method Integrating Multiple Features
4	Design And Implementation Of Text Resource Sharing System Based On Keyphrase Extraction
5	Research On Keyphrase Extraction Algorithm Based On Frequent Pattern Mining
6	Design And Implementation Of Core Word Extraction System In Search Engine
7	Research Of Product Attributes Extracton Technology Based On Bootstrapping
8	Incorporate Graph Network And Seq2seq For Keyphrase Extraction
9	A Study Of Word Vector Extraction Based On Neural Network
10	The Improved Extraction Word Model And Its Implementation Based On Word Boundary Characteristics