Font Size: a A A

Research On Keyphrase Extraction Algorithm Based On Word Embeddings Learning

Posted on:2020-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2428330596494572Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Keyphrases are the refinement of text topic information,which can help people quickly get the core content of the article,and are widely used in information retrieval,question and answer system,text classification and other fields.The traditional graph-based keyphrase extraction methods only consider the global structure information of words in the co-occurring word graph,ignoring the potential semantic information of the words in the sequence.Existing researches show that word embedding learning technology can effectively capture the potential semantic information of words in the sequence.Therefore,this study focuses on merging word embeddings into graph-based keyphrase extraction methods,and proposes a word embedding learning model for keyphrase extraction.Specific work includes:Learn word embeddings by general word embedding learning models,and blend them with position information of words in the document to improve the graph-based keyphrase extraction algorithm.The existing graph-based keyphrase extraction methods ignore the insufficiency of the potential semantic information of words in the sequence.The method combines the word embeddings that capture the potential semantics of words in the sequence with the positional information of words in the document and modifies PageRank algorithm to score words more reasonably,thereby improving the effect of keyphrase extraction.The experiment uses the three general word embedding models of Skip-gram,TWE-1 and fastText to learn the word embeddings and compares with five unsupervised keyphrase extraction methods.The experimental results show that the proposed graph-based keyphrase extraction algorithm integrating word embeddings and positional information,is superior to the PositionRank method in all evaluation indicators.Propose a novel word embedding learning model for keyphrase extraction and apply it to the graph-based keyphrase extraction method.Aiming at the insufficiency that the general word embedding learning model cannot effectively integrate the text features of words,a novel word embedding learning model for keyphrase extraction is proposed.The word embedding learning model first constructs a heterogeneous text graph based on the text content and its topic information.Then embeds and trains on this heterogeneous text graph;finally gets the word embedding of each word.The graph-based keyphrase extraction method integrates the learned word embeddings to better score the words.In addition,aiming at the insufficiency of the phrase scoring method can not effectively use the correlation among words,a word embedding-based phrase scoring method is proposed,which uses the distance of the word in the vector space to score phrases more reasonably.The experimental results compared with 8 unsupervised keyphrase extraction methods show that the proposed keyphrase extraction algorithm can better score words and phrases,thus improving the effect of keyphrase extraction.
Keywords/Search Tags:keyphrase extraction, word embedding, PageRank, word scoring, phrase scoring
PDF Full Text Request
Related items