Graph Neural Network Text Classification And Domain Adaptation Algorithm Based On Adversarial Training

Posted on:2023-08-30

Degree:Master

Type:Thesis

Country:China

Candidate:B Zeng

Full Text:PDF

GTID:2558306914980999

Subject:Electronic and communication engineering

Abstract/Summary:

In recent years,the use of deep learning algorithm for multi-label,hierarchical,fine-grained classification of large-scale electronic text has become the mainstream.However,the task of large-scale multi-label text hierarchical classification has the following two problems:(1)the confusion of label relations.In hierarchical multi-label long text data,the correlation between each label and each part of the text is different,and there is a correlation and hierarchical relationship between labels.How to extract and utilize the correlation between label and label become a major difficulty in the task of hierarchical text classification.(2)Class Imbalance.On large-scale text data sets,the class distribution of real data is generally a long-tailed distribution.Multiple rounds of training on this kind of data sets will lead to the problem of overfitting of most sample classes and underfitting of a few sample classes,thus affecting the effect of text classification.Therefore,how to alleviate the class imbalance from the perspective of algorithm is also one of the difficulties of text hierarchical classification.In view of the above two problems,this paper proposed three improved text hierarchical classification algorithms based on deep learning,in order to alleviate the confusion of label relations and the imbalance of data classes,and to improve the performance of multi-label text hierarchical classification algorithms.the main work is as follows:(1)This paper proposed an improved algorithm for decoupling text feature representation and classifier based on adversarial training.In order to solve the problem of underfitting performance of a small number of sample classes caused by the imbalance of data classes in large-scale text data sets,We decoupled text feature representation and classifier learning.for the problem of over-fitting of most sample classes,adversarial training and different sampling classes are used to suppress it,and finally a twostage training method is formed.In the first stage,the text feature representation is learned by sample equalization sampling and adversarial training algorithm,and in the second stage,the classifier parameters are adjusted by class equalization sampling.In order to alleviate the imbalance of data classes and improve the robustness of the model.The improved algorithm is tested on two kinds of publicly-available data sets.Compared with the original model on the RCV1 publicly-available data set,the improved algorithm increases 5.89%and 11.02%on the Micro-F1 and Macro-F1 evaluation indexes,and 5.01%and 4.34%on the Micro-F1 and Macro-F1 evaluation indexes on the NYTimes publicly-available data set.(2)This paper advanced an algorithm to construct label confusion relationship based on graph convolution neural network.Most of the true label representations use one hot encoding,which did not take into account the label sibling relationship and hierarchical relationship.Therefore,this paper designed a graph convolution neural network,through the composition of the label co-occurrence relationship in the data,and then uses the graph convolution neural network to extract the label relationship features and integrate them into the label one hot encoding.through the fusion of label features and text features to improve the label confusion problem and performance of the model in large-scale text hierarchical classification.The algorithm is tested on two kinds of publicly-available data sets.Compared with the original model on the RCV1 publiclyavailable data set,the algorithm increases 4.62%and 9.58%on the MicroF1 and Macro-F1 evaluation indexes,and 5.39%and 6.43%on the MicroF1 and Macro-F1 evaluation indicators on the NYTimes publicly-available data set.(3)This paper raised a adversarial domain adaptive hierarchical text classification algorithm based on maximum mean difference algorithm and correlation alignment algorithm.Due to the problem of unsatisfactory model effect and over-fitting caused by the difficulty of collecting some kinds of data,the data set is divided into head classes and tail classes according to the amount of data.the features of the two parts of data are mapped to high-dimensional space and the maximum mean difference algorithm and correlation alignment algorithm are used to fit the feature distribution of the two fields.Migrate the rich features of the header classes domain to the tail classes.Finally,it can alleviate the problem of class imbalance and improve the training effect of the model.The algorithm is tested on two kinds of publicly-available data sets.Compared with the original model on the RCV1 publicly-available data set,the algorithm increases 3.31%and 8.79%on the Micro-F1 and Macro-F1 evaluation indicators,and 6.81%and 5.71%on the Micro-F1 and Macro-F1 evaluation indicators on the NYTimes publicly-available data set.

Keywords/Search Tags:

Multi-label hierarchical text classification, Class Imbalance, Domain Adaptation, Adversarial Training, Graph Convolutional Neural Network

Related items

1	Research On Class Semantics And Imbalanced Distribution Methods For Multi-Label Text Classification
2	Research And Application Of Unsupervised Domain Adaptation Algorithm Based On Adversarial Training
3	Research On Multi-label Text Classification For Imbalanced Data
4	Research On Feature Extraction Of Multi-label Text Classification
5	Research On Unsupervised Domain Adaptation Method Based On Graph Convolutional Network
6	Research On Deep Multi-Source Domain Adaption Recognition Method Under Complex Data Conditions
7	Graph Adversarial Domain Adaptation For Non-shared-and-Imbalanced Transfer Learning Via Hierarchy Graph Reasoning
8	Class-imbalance Issue In Applying Multi-label Learning To The Study Of Parkinson In Traditional Chinese Medicine Diagnosis
9	Research On Text Classification For Proposals And Construction Of Domain Knowledge Graph
10	Study Of The Classification Method Of Imbalanced Multi-Label Data Based On Label Correlation