Font Size: a A A

Research On Short Text Classification Based On Graph Attention Networks

Posted on:2021-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:T NieFull Text:PDF
GTID:2518306104988449Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularization and application of smart devices,a large amount of fragmented short text messages are generated in daily life,such as mobile phone text messages,social dynamic sharing,search sentences,and product reviews.In order to mine more potential business value from these massive short text information,the task of short text classification has received more and more attention.Due to the unique attributes of short text data,it is more difficult to classify than long text data.On the one hand,short text is generally short and concise,and its grammar is not standardized,which leads to its sparsity of feature and lack of information;on the other hand,the amount of short text data is large and updated quickly,however we lack a large amount of labeled data for training.Short text classification task is the main research goal,by analyzing the advantages and disadvantages of different classification algorithms,a short text classification algorithm based on graph attention network is proposed.The main contents include:(1)The Co-occurrence Information Model(CIM)is proposed to construct the graph structure of short text data sets,so that the supplementary information of graph structure can effectively alleviate the sparseness of short text data.Specifically,we segment the short text in the corpus,then treat the words and short text as nodes in the graph,and use the co-occurrence information to construct the edges between word-word,word-text,and text-text.The co-occurrence information is obtained based on PMI,TF-IDF,Cosine similarity.(2)The graph neural network classification model is applied to the constructed graph data to classify the short text nodes in the graph.Specifically,a graph convolutional network(Graph Convolutional Networks,GCN)is used as a basic model to build a CIM-GCN model,and its advantages and disadvantages are analyzed from the principle;then,an attention mechanism in the graph is introduced to improve a graph attention network Networks,GAT)and get the CIM-GAT model;further,in order to extract and fuse attention features from different feature subspaces,the CIM-MGATs model is proposed,which mainly refers to the idea of multi-head attention.(3)In order to overcome the difficulty of lacking training data,a graph-based semi-supervised learning method is constructed.The labeled data and unlabeled data are used to build a graph together to enrich the graph structure information,and then the entire graph is modeled so that the label information and data features are effectivelypropagated in the graph structure,finally,the final representation and prediction results of all nodes in the graph can be obtained.Finally,this paper conducted experiments on short text classification data sets such as HR and MR,and found that the CIM-GAT and CIM-MGATs models based on graph attention network not only have higher classification accuracy than other models,but also more robust to the size of the training data.
Keywords/Search Tags:Short Text Classification, Co-occurrence Information, Graph Neural Network, Attention Mechanism, Semi-supervised Learning
PDF Full Text Request
Related items