Research On Keyword Extraction Integrating Multiple Attributes

Posted on:2021-05-12

Degree:Master

Type:Thesis

Country:China

Candidate:C Su

Full Text:PDF

GTID:2428330611968908

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Keywords reflect the subject of the thesis and can highly summarize the main content of the document,which are words or phrases,the results can be generally used in specific fields such as document retrieval,text categorization,and text topic mining.The previous graph-based automatic keyword extraction technology studies how to accurately score the words in the word map,ignoring the important Steps for keyphrase extraction and explores a better fusion method for a small number of features.Based on the systematic analysis of attribute characteristics of keywords,this paper summarizes the shortcomings and proposes graph-based keyphrase extraction approaches integrating multiple features.The thesis work includes:Proposing the features which affect the importance of words and review existing methods concerning a general keyword extraction framework,especially for the graph-based methods.Based on the Page Rank algorithm,scoring the words in the word map and designing different methods integrating multiple attributes to score and rank candidate phrases to extract keywords.The results of scoring candidate phrases in the traditional graph-based methods are greatly affected by the results of words and the length of the phrase.In this chapter,we combine the frequency and position information of the phrase in the text,and change the methods of calculating feature values for better combination to find the best phrase Scoring method.The experimental results show that the appropriate method surpass other comparison methods on three types of data sets and traditional keyphrase extraction,which can improve the keyword extraction results.Using general word embedding model to learn the representation and apply it to improve the algorithm for scoring words based on Page Rank by combining words relationship with word position features.Current graph-based keyword extraction methods lost the underlying semantic relationship of words in order,in this paper,we use the attributes between words and word embedding model to weigh the edge of the word graph,and the position feature for the node,then modify the random walk model.The experimental part replaces the word scoring algorithm in Chapter 3 with an improved word scoring method,and explores the effects of word scoring and phrase scoring on keyword extraction in different approaches.Compared with other unsupervised keyword extraction algorithms,the experimental results show that the improved word scoring way can score words and phrases preferably,thus improving the performance of keyword extraction.

Keywords/Search Tags:

keyword extraction, graph-based model, integrating multiple attributes, word vector

PDF Full Text Request

Related items

1	Research On Graph-based Keyphrase Extraction Method Integrating Multiple Features
2	Automatic Keyword Extraction Algorithms Based On Word Embedding And Multiple Features Fusion
3	Research On Keyword Extraction Method Based On Document Topical Structure And Word Graph Iteration
4	Chinese Text Keyword Extraction Algorithm Based On Graph And LDA
5	Multiple Documents Automatically Summary Based On Semantic Word Vector
6	Research On Graph-based Text Keyword Extraction Integrating Deep Learning
7	Research On Keyword Extraction Method Based On Semantics Features
8	Research And Application Of Collaborative Filtering Algorithm Based On Keyword Extraction Technology
9	Research On Keyword Extraction From Chinese News Web Pages Based On Compose Features
10	Research And Implementation Of News Keyword Extraction Method Based On Semantic Clustering And Weighted TextRank