Graph Representation And Document Classification Fusing Graph Neural Networks And Topic Discovery

Posted on:2023-03-21

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Zhang

Full Text:PDF

GTID:2568307025992669

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Various information systems generate a large number of document collections rich in text and linked data every day,and mining these document collections can quickly integrate them.Document classification is the main task of text mining.At present,mainstream methods use word representation to obtain document representation,and obtain the vector representation of words and documents based on the context information of words,and then achieve document classification.However,this kind of distributed word representation method only considers the local information of the word,and the performance is not enough when the document collection has few labels and the topic is scattered.The topic model considers the global corpus information,obtains word representations and topic representations with global semantic information,and then represents documents.Graph neural networks model interactions between words and between documents for word representation and document classification.However,these methods require label information and do not utilize the interaction information between documents,resulting in poor classification results when the quality of text information is not high and label information is insufficient.Based on the existing word representation learning method,graph neural network classification model and topic model research,this paper designs and implements an unsupervised text classification model,algorithm and experiment based on document set text information and fusion link document network text attributes.The main contents are as follows:(1)Unsupervised document clustering model Text ING＿TM(Inductive Text classification via GNN Topic Model)that integrates topic model and graph neural network: Text ING constructs a word co-occurrence document graph for each document,and learns on all document word graphs based on GCN Document representation,and then train a document classification model in a supervised manner.However,this method requires document labels,and word graph-based document representation cannot learn global features of words.Therefore,an unsupervised text classification model Text ING＿TM is proposed.The model first uses ETM to learn document representations containing global word features,performs Kmeans clustering on the learned document topic representations as pseudo-classifiers of documents,and then uses Text ING to train a document classification model.The classification accuracy is improved by 1.73%,1.47%,1.1% over ETM on MR,R8 and 20 NG datasets.(2)Super Gat-gc(Self-supervised Graph ATtention network-Graph Clustering),a graph neural network document classification model using the document network:DAEGC learns the graph embedding representation of the attribute link network through the attention mechanism,and jointly optimizes some parameters with Kmeans clustering Implement unsupervised text classification.But the graph autoencoder part of the model has difficulty dealing with noisy graphs.Therefore,an unsupervised text classification model Super Gat-gc that can alleviate the influence of noise graph is proposed.The model replaces the GAT part of the autoencoder with Super GAT to learn embedding representations with noisy graphs,and utilizes Kmeans clustering for unsupervised text classification.Experiments show that the classification accuracy of the model on the Cora and Citeseer datasets is 1.3% and 1% higher than that of DAEGC.

Keywords/Search Tags:

Word Representation, Topic Discovery, Graph Neural Network, Unsupervised Document Classification

PDF Full Text Request

Related items

1	Research On Representation Learning Of Word Semantics And Topic Discovery On Document Link Networks
2	Research On The Representation Learning Method Of Fusion Word And Topic
3	Research And Implementation Of Short Text Topic Extraction Based On Document-Word Co-Occurrence Graph
4	Research On Unsupervised Representation Learning Methods And Applications For Complex Data
5	A Research Of Document Representation And Bilingual Word Embeddings
6	Research On Text Representation And Text Classification Based On Graph Convolutional Network
7	Topic Discovery And Visualization Of Web Video Using Star-structured K-partite Graph
8	Research On Keyword Extraction Based On Latent Topic Model And New Word Discovery
9	Research And Implementation Of NLP Knowledge Graph Based On New Word Discovery And Feature Fusion
10	Analysis On Co-word Network Based On Graph Representation Learning