Font Size: a A A

Research On Unsupervised Keyword Extraction Based On Semantic And Textual Features

Posted on:2024-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:C GuFull Text:PDF
GTID:2568307118450284Subject:Engineering
Abstract/Summary:PDF Full Text Request
Keyword Extraction is an important research direction in the field of natural language processing,and it is a technique that can automatically extract core themes and key content from text.According to the extraction method,it can be divided into supervised and unsupervised.This thesis aims at the problems of poor extraction appearance caused by weak context semantic integration ability,redundant semantic extraction,insufficient integration of semantic text features in unsupervised keyword extraction.Based on the existing research,this thesis studies the unsupervised keyword extraction method based on pre-trained semantic embedding and text feature,in order to enhance the performance of keyword extraction.The main contributions of this thesis are as follows:(1)To address the problem of weak context combination of semantic information and extracted keywords with similar semantics in existing semantic-based extraction methods,an unsupervised keyword extraction method based on pre-trained language model was proposed.This method combined the hidden layer of the BERT pre-trained model with contextual semantic information to obtain word and document semantic embedding with contextual semantic information,which further enhanced the semantic representation ability in the extraction task.Moreover,a candidate word semantic deduplication module based on semantic embedding clustering was added,which reduced the problem of repeated extraction of semantically similar keywords and improved the quality of keyword extraction.(2)In order to further strengthen the combination of text features and semantic feature information,strengthen the comprehensiveness of feature consideration in unsupervised keyword extraction,a graph-based method combined with text features was proposed.This method constructed a network graph using word embedding by combined the semantic and text feature of the full-text and candidate words,combined semantics and text features;then introduced the text feature weight calculation module,integrates various types of text feature information weights,and introduced feature weighting into the candidate word scoring calculation,which strengthens the contribution of each dimension feature to the candidate word scoring step,improved the effect of extraction.(3)In order to verify the effectiveness of the proposed methods,this thesis collected various types of abstracts and short news corpus from the Internet as verification datasets and evaluated the effects of the above methods on them respectively.The effectiveness of the improved module and parameter selection was verified through comparative experimental evaluations and case analysis.The experimental results proved the effectiveness of the proposed improved methods in unsupervised keyword extraction task,and has better keyword extraction effects compared with the comparison methods.
Keywords/Search Tags:Unsupervised keyword extraction, Pre-trained model, Semantic embedding, Text features
PDF Full Text Request
Related items