Font Size: a A A

Research On Multi-object Image Recognition Methods Based On Graph Convolution Network

Posted on:2022-08-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y T WangFull Text:PDF
GTID:1488306575451944Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology,the rise of social networks and the popularity of intelligent terminals,multimedia data represented by images are growing explosively.Image recognition has become a challenging topic and attracted more and more attention.Existing convolution neural network(CNN)based methods have achieved good results in the image recognition field.However,most of them are trained on single-object images,which neither fully learn the correlation between different objects within multiobject images nor fully consider the similarity between different multi-object images.Therefore,there are still three problems to be solved in multi-object image recognition tasks:(1)with supervision information,existing image classification methods fail to efficiently fuse image features and label embeddings,thus severely affecting the model convergence;(2)with supervision information,existing image classification methods cannot integrate the global label dependencies into image visual consistency,thus limiting the model performance improvement;(3)without supervision information,existing image hashing methods cannot effectively learn the similarity between multi-object images,thus limiting the quality of hash codes as well as the performance of image retrieval.To address the above problems,the main contributions of this thesis are summarized as follows:(1)To address the problem of low fusion efficiency between image features and label embeddings in multi-object image classification tasks,we propose F-GCN,a fast multiobject image classification model based on cross-modal feature fusion.F-GCN mainly consists of an image feature extraction module,a label co-occurrence embedding learning module and a cross-module feature fusion module.The first two modules respectively use CNN and graph convolution network(GCN)to learn the image features and label cooccurrence embeddings.The cross-modal feature fusion module improves Multi-modal Factorized Bilinear pooling(MFB)component to efficiently fuse image features and label co-occurrence embeddings,which greatly speeds up the model convergence efficiency and further improves the image recognition performance.Extensive experiments on two multiobject image datasets(MS-COCO and VOC2007)demonstrate,compared with the state-ofthe-art methods,F-GCN outperforms others by more than 11 times in terms of the convergence efficiency.Besides,the image classification performance of F-GCN has also been slightly promoted on the conventional image classification evaluation metrics.(2)To integrate the global label dependencies into image visual consistency,we combine GCN and CAM,and propose a multi-object image recognition model G-CAM.GCAM mainly consists of an image feature extraction module and a label co-occurrence embedding learning module,where the former uses two CNNs with shared weights to generate the feature maps and feature vectors of the original image and its transformed one,while the later learns the weighted classifiers that contain the label co-occurrence embeddings between different objects.G-CAM innovatively replaces the original fullyconnected classification layer with the weighted classifiers,which effectively preserves the visual consistency under different transforms of an image.Extensive experiments on three multi-object image datasets(FLICKR25K,MS-COCO and NUS-WIDE)demonstrate the image classification performance of G-CAM has been further promoted with 0.8%-1% m AP improvement.(3)To address the problem that unsupervised hashing methods cannot effectively learn the similarity between multi-object images,we propose an unsupervised deep image hashing method NRDH,which mainly consists of node representation learning stage and hash function learning stage.In the first stage,NRDH treats each image as a node and innovatively designs a GCN-based Auto Encoder that can capture the similarity between images and generate the representation of each image in the unsupervised way.In the second stage,NRDH uses the above node representations as supervision information to guide the hash function learning to generate hash codes for each image.Extensive experiments on a single-object image dataset(CIFAR-10)and two multi-object image datasets(MS-COCO and FLICKR25K)demonstrate,NRDH produces better results on multi-object images with0.5%-3% m AP improvement compared with existing unsupervised image hashing methods.
Keywords/Search Tags:Multi-object image classification, Unsupervised image hashing, Graph convolution network, Cross-modal feature fusion, Class activation mapping, AutoEncoder
PDF Full Text Request
Related items