Research On Text Keyword Extraction Method Based On Multi Granularity Attention And Pre Training Model

Posted on:2024-07-19

Degree:Master

Type:Thesis

Country:China

Candidate:L Ye

Full Text:PDF

GTID:2568307103975009

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Keyphrase extraction is the task of extracting phrases from target texts that can effectively summarize the main content.It not only provides users with a summary of text at the phrase-level granularity but also serves as an additional feature that greatly influences the performance of downstream natural language processing tasks.Unsupervised text keyphrase extraction methods based on vector embeddings are popular research directions due to their strong interpretability,good extraction performance,and lack of dependence on annotated datasets.However,existing methods mostly use bag-of-words language embedding models and distance similarity to extract keyphrases,which treat keyphrases independently from the context of the text,ignore the contextual information of keyphrases in the text,and use the same calculation parameters for different texts,resulting in difficulties in summarizing the main theme of the text.In addition,with the development of pre-trained language models,supervised methods based on pre-trained language models have demonstrated powerful text keyphrase extraction capabilities.However,existing methods often simplify the extraction task into a sequence labeling problem and use binary classification methods to determine whether a candidate phrase is a keyphrase.This independent keyphrase discrimination method often overlooks the global semantic relationship between candidate phrases and the original text,resulting in the inability to extract highly summarizing keyphrases.Therefore,this thesis conducts the following targeted research:(1)In response to the issues of ignoring the contextual information of keywords and inability to process different features specifically in vector embedding-based unsupervised models,this thesis proposes an unsupervised text keyphrase extraction model based on bidirectional multi-granularity attention.In this model,the candidate features of phrases can use multi-granularity cross-attention to learn contextual semantic information from the native text features,and conversely,the native text features can use multi-granularity cross-attention to provide targeted attention scores for candidate phrase features.The attention scores are not only directly used for keyphrase extraction,but also used to weight candidate phrase features for downstream task prediction.The downstream task prediction loss serves as the supervision signal for model training.Corresponding comparative experiments demonstrate the effectiveness of the model and its core module.(2)In response to the insufficient consideration of the global semantic correlation between text and keywords in pre-trained language model-based keyphrase extraction methods,this thesis proposes a text keyphrase extraction model based on the dual-tower pre-training model.The model includes two independent BERT networks: the candidate matrix generation network and the text vector generation network.The two networks respectively embed features of candidate phrases and native text.At the same time,the model uses cross-attention to combine the features of candidate phrases and native text and achieve targeted scoring for candidate phrase features.The model has been validated for effectiveness on three test sets for text keyphrase extraction tasks.(3)Building upon the two aforementioned methods,a complaint information early warning system based on text keyphrase extraction is developed in this thesis.The system is capable of modeling complaint corpus in different scenarios,automatically detecting and classifying the corpus,and providing keyphrase information to assist users in verifying complaint classification.

Keywords/Search Tags:

keyphrase extraction, attention mechanism, pre-trained language models, unsupervised method

PDF Full Text Request

Related items

1	Research On Keyphrase Generation Based On Text Structure Information Enhancement And Pre-Trained Language Model
2	Explore Unsupervised Structures In Pre-trained Models For Relation Extration
3	Research On Span-Extraction Machine Reading Comprehension Models Based On Pre-Trained Language Models
4	Research On Text Keyphrase Generation Method Based On Pre-trained Language Model
5	The Research On Keyphrase Extraction Method Of Scientific Literature Based On Feature Representation
6	Research On Automatic Keyphrase Technology In Academic Corpus
7	Design And Implementation Of Text Resource Sharing System Based On Keyphrase Extraction
8	Research On Keyword Extraction Method Based On Semantics Features
9	Research On Entity Relation Extraction Technology Based On Deep Learning
10	Research On Graph-based Keyphrase Extraction Integrating Multiple Attributes