Font Size: a A A

Research On Multi-label Image Classification Method Based On Graph Attention Network

Posted on:2022-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2518306743463464Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Most of the images in real life are multi-label images.In addition to utilizing the characteristics of the images themselves,graph neural network can also be used to extract the correlation between labels in images.ML-GCN is a multi-label classification model that uses graph convolution network to extract label relationships.Although MLGCN has a good performance in multi-label image classification,the asymmetric relationship between labels cannot be well noticed when learning label relationships using graph convolution network.Moreover,the dimension of the multi-label category co-occurrence embedding matrix generated by the graph convolution network is much larger than the number of labels that need to be classified,which will cause redundancy in the multi-label category co-occurrence embedding matrix to affect the classification performance of the model.In order to improve the classification performance of the model for multi-label images,a multi-label image classification model MLL-GAT based on graph attention network is proposed to overcome these two shortcomings in ML-GCN.To solve the problem that the multi-label category co-occurrence embedding matrix generated by ML-GCN does not well express the asymmetric relationship between labels,MLL-GAT uses the graph attention network to generate an asymmetric attention coefficient for each pair of labels,and generates a multi-label category cooccurrence embedding matrix containing the asymmetric relationship between labels based on this attention coefficient.In the process of calculating the coefficient in the graph attention network,the label word embedding matrix and the label co-occurrence relation matrix are needed as input,so MLL-GAT also needs to obtain and process label word embedding matrix and label co-occurrence relationship matrix beforehand.The acquisition method of label word embedding matrix is to extract the word embedding of each label word by pre-training BERT model,and stitch them together to form a pretraining BERT word embedding matrix containing all label words embedding.This matrix is limited to the setting of pre-training BERT model's own parameters,so it has a higher dimension.It is not good for the graph attention network to extract the asymmetric relationship between labels by directly input high-dimensional label words embedded matrix into the model.The model designs a rectangular filter to reduce the dimension of the high-dimensional label words embedded matrix.Through the convolution and subsampling of the high-dimensional label word embedding matrix with the filter,the low-dimensional label word embedding matrix with the label word information can be obtained.The acquisition of label co-occurrence relationship matrix is based on the statistical generation of label co-occurrence relationship in multi-label image data collection.Rows and columns in the matrix represent the labels themselves,elements in the matrix are based on how many times each label appears together.If the matrix is used directly as the basis of the label relationship,noise and smoothing will occur.For the noise problem,MLL-GAT needs to set a threshold value for the elements in the matrix,and the element value greater than this threshold is set as 1 and the element value less than this threshold is set as 0.For the over-smooth problem,the matrix is re-weighted to solve it.In order to solve the problem that ML-GCN produces a multi-label category cooccurrence embedding matrix with too high dimension using graph convolution neural network,MLL-GAT inputs both the label relationship matrix and the low-dimension label word embedding matrix into the graph attention network through two layers of graph attention network calculation,and controls the output dimension of each layer of the graph attention network according to the label categories actually contained in the dataset.The multilabel category co-occurrence embedded matrix with dimensions matching the number of labels in the dataset is obtained.MLL-GAT input the original multi-label image into the convolution neural network to extract the general features of the image,the dimension of the general feature vector extracted by convolutional neural network is set to be consistent with the dimension of a single label word in the multi label category co-occurrence embedding matrix.On this basis,the label prediction score of each multi-label image can be obtained by combining MLL-GAT multi-label co-occurrence embedding matrix and image common feature vector.The experimental results on two multi-label image data sets,VOC 2007 and NUS-WIDE,show that MLL-GAT can effectively solve the multi-label image classification problem when training samples are sufficient and label categories are clearly labeled.
Keywords/Search Tags:Multi-label learning, Deep learning, Graph attention network, Reduces dimensionality
PDF Full Text Request
Related items