| As the Internet develops rapidly,various kinds of data from the network are increasingly daily,and text data gradually becomes the most widely distributed and data-rich information carrier because it takes up less network resources and is easily accessible.As an efficient text retrieval and mining technology,text classification can quickly and accurately find relevant information,help people make decisions and improve work efficiency.It is a widely studied problem in natural language processing and has been solved in many real-world applications,such as news filtering,spam detection,conversation behavior classification and question answering,etc.Deep learning models have now become the dominant approach in the natural language processing literature.Compared to hand-crafted indicator features,sentence representations obtained by neural networks are less sparse and more flexible in encoding complex syntactic and semantic information.In recent years,thanks to its ability to handle the advantages of complex structures and better preserve global information,A novel neural network,graph neural network,attracts a widespread interest.They do not consider text as a sequence,but as a set of parallel words.Data from many real applications can be naturally converted into graphs.The graph is a natural structure for describing complex relationships between tokens.And recent advances in graph neural networks have certainly provided a powerful tool for modeling graph-structured data.Nevertheless,there are three major shortcomings of the existing graph-based models.First,text graphs constructed by human-defined rules are not real graph data,because the construction process may introduce a lot of noise due to unreasonable settings.Second,for a fixed corpus-level graph structure,test documents are mandatorily accessed during the training process.These models do not make full use of the labeled and unlabeled information of the nodes.Finally,such models only focus on word co-occurrence and word frequency inverse document frequency information for constructing graph data,ignoring the contextual information of the text to a certain extent.Meanwhile,contrastive learning has developed to be an effective method to fully utilize node information in graph domains.Thus,for the purpose to solve the above issues,This paper proposes for new graph neural network-based text classification models: the graph contrastive convolutional text classification algorithm based on adaptive augmentation CGA2 TC and the graph contrastive convolutional text classification algorithm based on global multi-view Text GGRU.CGA2 TC introduces an adaptive augmentation strategy to obtain a more robust node representation.First,we utilize existing research to explore word co-occurrence and document-word relationships to construct a text graph.Second,we design an adaptive augmentation strategy for the text graph with noise to generate two attribute-invariant contrastive views that effectively solve the noise problem and maintain the intrinsic underlying topology of the data.Specifically,we design two augmentation strategies on the topology of the text graph: a noise-based and a centrality-based strategy to interfere with unimportant connections and thus highlight relatively important edges.Meanwhile,we design a joint contrastive loss to improve the contrastive representation of the node classification task.The joint contrastive loss can come in two components: supervised contrastive loss and self-supervised contrastive loss.Specifically,for supervised contrastive loss,we construct multiple positive sample pairs for anchor nodes using known labels as supervised information.These positive sample pairs belong to the same class as the anchor node and come from two augmented views,intra-view and inter-view positive sample pairs,respectively(same for negative samples).The method of constructing multiple positive sample pairs for labeled nodes allows each class to obtain a more differentiated representation during model training;for self-supervised contrastive loss,a positive sample pair consisting of the anchor node and its corresponding augmented node is usually built,and the negative samples consist of the remaining nodes in both views.Then,to better combine GNN with the contrastive learning framework,we cleverly design the GNN as a structure consisting of an encoder and a classifier with feature extraction capabilities.While Text GGRU no longer augments the topology of the graph to produce two views,but two types of graph neural networks,generating node representations from two different views to reveal local and global cues to complement each other’s information and enrich the final representation results.This paper validates the performance of the proposed models on four commonly used datasets for text classification tasks and compares them with the current popular models:based on word embedding models(PV-DBOW/DM,Fast Text,SWEM,LEAM),sequence deep learning models(CNN,LSTM,Bi-LSTM),and graph-based representation learning models(Graph-CNN-C/S/F,Text GCN,Text-Level-GCN,DHTG,Text ING,Text QGNN)are compared.The results of the experiments indicated that the CGA2 TC and Text GGRU proposed in this paper are more competitive with the comparative models in classification performance.In addition,in order to highlight the effectiveness of CGA2 TC and Text GGRU more comprehensively,this paper uses evaluation metrics other than accuracy-Recall and F-Score as well as sets up multiple sets of ablation experiments and parameter experiments to compare with models such as Text GCN. |