Font Size: a A A

Research On Graph-based Keyphrase Extraction Method Integrating Multiple Features

Posted on:2019-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y C ChangFull Text:PDF
GTID:2348330569988248Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Keyphrases can provide a high-level topic description of a document,they are useful for a wide range of natural language processing tasks such as information retrieval and question answering.Traditional graph-based keyphrase extraction approaches only utilizes several features and did not take advantage of the features proposed by supervised approaches.This thesis focus on integration of different features.A feature oriented survey is conducted to keyphrase extraction algorithms and based on the analysis of existing graph-based methods,two graph-based keyphrase extraction approaches integrating multiple features are proposed.We introduce our work as follows:Completing a survey concerning the features and keyphrase extraction methods.On the features part,we classify existing features into four categories: word features,structure features of word graph,topic features and word embedding,we describe each kind of features by their specialty and usage.On the method part,we explained the details of existing methods,especially for the graph-based methods.Based on PageRank,using a parametric model to integrate multiple features for keyphrase extraction.Since existing graph-based approaches only use a few features,we integrate word features,structure features of word graph and topic features into a parametric model,and use gradient descend to learn the feature parameters.This method is applied to keyphrase extraction of the research papers.The experiments show that this method can utilize different kinds of features flexibly and achieves higher results than CiteTextRank.Based on PageRank,using word embedding for keyphrase extraction.With the rise of deep learning,word embedding becomes an important semantic feature.We use word embedding along with other traditional features to weigh the edge of the word graph,and the tf-idf feature alone for the node,then we use the PageRank algorithm to score the nodes of the word graph.This method is also used to extract the keyphrases of research papers.Experiments show that word embeddings can improve the extraction result and this method achieves higher results than WordAttractionRank and CiteTextRank.
Keywords/Search Tags:keyphrase extraction, graph-based approaches, parametric model, word embedding, integrating multiple features
PDF Full Text Request
Related items