Research On Text Multi-Label Classification Based On Heterogeneous Graph Attention Network

Posted on:2023-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Han

Full Text:PDF

GTID:2558306845990839

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Text classification is a fundamental task in natural language processing.In recent years,graph attention networks and pre-trained models have been widely used in text classification research.In the task of text multi-label classification,there are still some problems worthies of investigation.On the one hand,some local semantic feature information appears more frequently in the text,which has certain pointers to label prediction,while the pre-training model is weak in capturing local feature semantics when dealing with the text multi-label classification task.This thesis addresses the problem by designing heterogeneous graph attention networks to obtain higher-order neighborhood information in the corpus,extract local semantic features of the text,and discover potential semantic information of the text.On the other hand,there are connections between the labels of the text,and when a label appears the labels related to it are more likely to appear.To strengthen the connections between the labels and optimize the label node representation,this thesis introduces comparative learning to construct positive and negative samples of label nodes and construct label-label edges in the heterogeneous graph attention network.The specific work of the thesis is as follows.(1)A ML-BHGAT-SIF model for multi-label classification of text based on heterogeneous graph attention networks fused with feature word information is designed.The model is improved based on the Doc BERT model.In the text representation module,contextual semantic features are obtained using the BERT pretrained language model,and to further extract the contextual relationships of the text,a heterogeneous graph attention network based on three node types of words,feature words,and tags is designed to extract the local semantic features in the text;multi-label classification module,the contextual semantic features are combined with the local In the text multi-label classification module,the contextual semantic features are stitched together and passed into the linear layer for classification.Experiments were conducted on four datasets,Sem Eval2018,Ren-CECps,RCV1-V2,and AAPD,and the results showed that the ML-BHGAT-SIF model improved the Macro-F1 metrics by 3.8%,0.3%,1.5%,and 1.7%,respectively,compared to the Doc BERT base model,and the Micro-F1 metrics by 1.9%,4.9%,0.5%,and 0.8% respectively.(2)A ML-BHGAT-CLC model for multi-label classification of text based on heterogeneous graph attention networks fused with label relevance was designed.The model is improved based on the ML-BHGAT-SIF model.In the text representation module,new label node-label node edges are added to the heterogeneous graph attention network,and to further obtain the relationship between labels,contrast learning is introduced to construct positive and negative samples for label nodes in the heterogeneous graph to enhance the text representation;in the text multi-label classification module,the ML-BHGAT-SIF model is used in the classification module.Experiments were conducted for four datasets Sem Eval2018,Ren-CECps,AAPD,and RCV1-V2.The experimental results showed that the ML-BHGAT-CLC model improved the Macro-F1 metric by 7.9%,2.0%,4.6%,and 5.5%,respectively,compared to the Doc BERT base model,and by 1.6% compared to the ML-BHGAT-SIF model by1.6%,1.6%,2.2%,and 2.5%,respectively,and in the Micro-F1 metric by 2.3%,2.2%,3.5%,and 2.3%,respectively,compared to the Doc BERT model,and by 0.4%,1.8%,1.1%,and 1.4%,respectively,compared to the ML-BHGAT-SIF model.In this thesis,the two designed models are compared with other existing text multilabel classification models such as Fast Text,Seq2 Seq,and SGM.The experimental results show that the model in this thesis out performs the other models in the Macro-F1 metric F1 and Micro-F1 metrics in the above dataset.

Keywords/Search Tags:

Graph Attention Network, Text Representation, Text Multi-label Classification

PDF Full Text Request

Related items

1	Research On Text Multi-Label Classification Based On Heterogeneous Graph Attention Network
2	Multi-label Text Classifification Model Based On Correlation-guided Representation
3	Research On Multi-Label Text Classification Based On Deep Learning
4	Research On Multi-Label Text Classification Based On Deep Learning
5	Research On Multi-Label Text Categorization Based On Label Embedding Information
6	Research On Text Multi Label Classification Algorithm Based On Self-attention Mechanism And Graph Convolution Network
7	Research On Feature Extraction Of Multi-label Text Classification
8	Research On Text Classification Tasks Integrating Label Informatio
9	Research On Label-aware Text Classification Methods
10	Research On Key Techniques Of Short-text Representation And Classification Based On Hybrid Semantic