Font Size: a A A

Research On Multi-Label Image Recognition Based On Graph Neural Network

Posted on:2023-12-24Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LeiFull Text:PDF
GTID:2568306794994129Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Multi-label image recognition is a fundamental and practical task in the field of computer vision,which aims to simultaneously predict the presence of multiple objects in an image.This task has great research value because it is widely used in practical scenarios such as attribute partitioning,object detection,and search engines.Compared to traditional single-label image recognition,modeling becomes very challenging because images in multi-label tasks contain rich semantic information.Modeling the correlation between labels is the core research topic in multi-label image recognition tasks,and with the rapid development of graph neural networks in deep learning,the research process of multi-label image recognition has been accelerated,and the learning of label correlation is also proposed a new idea.Based on the research of related content,this paper proposes a multi-label image recognition algorithm using Graph AttentionNetwork(GAT).The algorithm first designs a label classifier module based on GAT network,which uses the word vector trained by the Glove model as the prior input of the label node,and the method of cosine similarity is used to adaptively generate the label relationship graph.At the same time,the GAT network uses the mask mechanism to automatically assign different weight parameters to the neighbor nodes,and then the label relationship graph contains the nodes contained in the graph.Then,the image features are extracted through the ResNet-101 network to obtain the visual features of the images;finally,the learned label classifiers are combined with the visual features of the images to complete multiple Label image recognition.So as to improve the recognition performance.When the previous multi-label image recognition methods combine the label classifier matrix and image visual features,they often use the dot product method for simple fusion,ignoring the complex interaction between different modalities and seriously limiting the model’s convergence speed and recognition accuracy.To solve this problem,based on the multi-label image recognition algorithm based on GAT network,this paper introduces multi-modal factorization bilinear pooling(MFB)as an effective tool to fuse cross-modal embedding,and proposes a method based on GAT’s multi-modal fusion fast multi-label image recognition algorithm consists of three key modules:(1)image visual feature extraction module,which uses ResNet-101 to learn and generate image visual features;(2)label classifier learning module,this module first uses the word embedding technology to obtain the label vector,and then uses the GAT network to learn a classifier with label correlation;(3)MFB module,this module designs an MFB fusion model suitable for multi-label image recognition tasks.Cascading multiple MFBs to effectively fuse classifiers and image features to complete multi-label image recognition.In order to verify the effectiveness of the proposed algorithm,experiments were completed on the international authoritative datasets Pascal VOC2007 and MS-COCO2014 respectively and compared with the current excellent algorithms,which proved that the algorithm can improve the multi-label image recognition accuracy and speed up the model convergence speed.aspect is more excellent.
Keywords/Search Tags:machine learning, graph attention network, multimodal fusion, label correlation, multi-label image recognition
PDF Full Text Request
Related items