| Text classification is a fundamental task in natural language processing.In recent years,graph attention networks and pre-trained models have been widely used in text classification research.In the task of text multi-label classification,there are still some problems worthies of investigation.On the one hand,some local semantic feature information appears more frequently in the text,which has certain pointers to label prediction,while the pre-training model is weak in capturing local feature semantics when dealing with the text multi-label classification task.This thesis addresses the problem by designing heterogeneous graph attention networks to obtain higher-order neighborhood information in the corpus,extract local semantic features of the text,and discover potential semantic information of the text.On the other hand,there are connections between the labels of the text,and when a label appears the labels related to it are more likely to appear.To strengthen the connections between the labels and optimize the label node representation,this thesis introduces comparative learning to construct positive and negative samples of label nodes and construct label-label edges in the heterogeneous graph attention network.The specific work of the thesis is as follows.(1)A ML-BHGAT-SIF model for multi-label classification of text based on heterogeneous graph attention networks fused with feature word information is designed.The model is improved based on the Doc BERT model.In the text representation module,contextual semantic features are obtained using the BERT pretrained language model,and to further extract the contextual relationships of the text,a heterogeneous graph attention network based on three node types of words,feature words,and tags is designed to extract the local semantic features in the text;multi-label classification module,the contextual semantic features are combined with the local In the text multi-label classification module,the contextual semantic features are stitched together and passed into the linear layer for classification.Experiments were conducted on four datasets,Sem Eval2018,Ren-CECps,RCV1-V2,and AAPD,and the results showed that the ML-BHGAT-SIF model improved the Macro-F1 metrics by 3.8%,0.3%,1.5%,and 1.7%,respectively,compared to the Doc BERT base model,and the Micro-F1 metrics by 1.9%,4.9%,0.5%,and 0.8% respectively.(2)A ML-BHGAT-CLC model for multi-label classification of text based on heterogeneous graph attention networks fused with label relevance was designed.The model is improved based on the ML-BHGAT-SIF model.In the text representation module,new label node-label node edges are added to the heterogeneous graph attention network,and to further obtain the relationship between labels,contrast learning is introduced to construct positive and negative samples for label nodes in the heterogeneous graph to enhance the text representation;in the text multi-label classification module,the ML-BHGAT-SIF model is used in the classification module.Experiments were conducted for four datasets Sem Eval2018,Ren-CECps,AAPD,and RCV1-V2.The experimental results showed that the ML-BHGAT-CLC model improved the Macro-F1 metric by 7.9%,2.0%,4.6%,and 5.5%,respectively,compared to the Doc BERT base model,and by 1.6% compared to the ML-BHGAT-SIF model by1.6%,1.6%,2.2%,and 2.5%,respectively,and in the Micro-F1 metric by 2.3%,2.2%,3.5%,and 2.3%,respectively,compared to the Doc BERT model,and by 0.4%,1.8%,1.1%,and 1.4%,respectively,compared to the ML-BHGAT-SIF model.In this thesis,the two designed models are compared with other existing text multilabel classification models such as Fast Text,Seq2 Seq,and SGM.The experimental results show that the model in this thesis out performs the other models in the Macro-F1 metric F1 and Micro-F1 metrics in the above dataset. |