| Various information systems generate a large number of document collections rich in text and linked data every day,and mining these document collections can quickly integrate them.Document classification is the main task of text mining.At present,mainstream methods use word representation to obtain document representation,and obtain the vector representation of words and documents based on the context information of words,and then achieve document classification.However,this kind of distributed word representation method only considers the local information of the word,and the performance is not enough when the document collection has few labels and the topic is scattered.The topic model considers the global corpus information,obtains word representations and topic representations with global semantic information,and then represents documents.Graph neural networks model interactions between words and between documents for word representation and document classification.However,these methods require label information and do not utilize the interaction information between documents,resulting in poor classification results when the quality of text information is not high and label information is insufficient.Based on the existing word representation learning method,graph neural network classification model and topic model research,this paper designs and implements an unsupervised text classification model,algorithm and experiment based on document set text information and fusion link document network text attributes.The main contents are as follows:(1)Unsupervised document clustering model Text ING_TM(Inductive Text classification via GNN Topic Model)that integrates topic model and graph neural network: Text ING constructs a word co-occurrence document graph for each document,and learns on all document word graphs based on GCN Document representation,and then train a document classification model in a supervised manner.However,this method requires document labels,and word graph-based document representation cannot learn global features of words.Therefore,an unsupervised text classification model Text ING_TM is proposed.The model first uses ETM to learn document representations containing global word features,performs Kmeans clustering on the learned document topic representations as pseudo-classifiers of documents,and then uses Text ING to train a document classification model.The classification accuracy is improved by 1.73%,1.47%,1.1% over ETM on MR,R8 and 20 NG datasets.(2)Super Gat-gc(Self-supervised Graph ATtention network-Graph Clustering),a graph neural network document classification model using the document network:DAEGC learns the graph embedding representation of the attribute link network through the attention mechanism,and jointly optimizes some parameters with Kmeans clustering Implement unsupervised text classification.But the graph autoencoder part of the model has difficulty dealing with noisy graphs.Therefore,an unsupervised text classification model Super Gat-gc that can alleviate the influence of noise graph is proposed.The model replaces the GAT part of the autoencoder with Super GAT to learn embedding representations with noisy graphs,and utilizes Kmeans clustering for unsupervised text classification.Experiments show that the classification accuracy of the model on the Cora and Citeseer datasets is 1.3% and 1% higher than that of DAEGC. |