Font Size: a A A

Research On Keyword Extraction Integrating Multiple Attributes

Posted on:2021-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:C SuFull Text:PDF
GTID:2428330611968908Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Keywords reflect the subject of the thesis and can highly summarize the main content of the document,which are words or phrases,the results can be generally used in specific fields such as document retrieval,text categorization,and text topic mining.The previous graph-based automatic keyword extraction technology studies how to accurately score the words in the word map,ignoring the important Steps for keyphrase extraction and explores a better fusion method for a small number of features.Based on the systematic analysis of attribute characteristics of keywords,this paper summarizes the shortcomings and proposes graph-based keyphrase extraction approaches integrating multiple features.The thesis work includes:Proposing the features which affect the importance of words and review existing methods concerning a general keyword extraction framework,especially for the graph-based methods.Based on the Page Rank algorithm,scoring the words in the word map and designing different methods integrating multiple attributes to score and rank candidate phrases to extract keywords.The results of scoring candidate phrases in the traditional graph-based methods are greatly affected by the results of words and the length of the phrase.In this chapter,we combine the frequency and position information of the phrase in the text,and change the methods of calculating feature values for better combination to find the best phrase Scoring method.The experimental results show that the appropriate method surpass other comparison methods on three types of data sets and traditional keyphrase extraction,which can improve the keyword extraction results.Using general word embedding model to learn the representation and apply it to improve the algorithm for scoring words based on Page Rank by combining words relationship with word position features.Current graph-based keyword extraction methods lost the underlying semantic relationship of words in order,in this paper,we use the attributes between words and word embedding model to weigh the edge of the word graph,and the position feature for the node,then modify the random walk model.The experimental part replaces the word scoring algorithm in Chapter 3 with an improved word scoring method,and explores the effects of word scoring and phrase scoring on keyword extraction in different approaches.Compared with other unsupervised keyword extraction algorithms,the experimental results show that the improved word scoring way can score words and phrases preferably,thus improving the performance of keyword extraction.
Keywords/Search Tags:keyword extraction, graph-based model, integrating multiple attributes, word vector
PDF Full Text Request
Related items