Font Size: a A A

Image Difference Captioning With Instance-level Fine-grained Feature Representation

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2518306536453314Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of related technologies in computer vision and natural language processing has greatly promoted the derivation and study of their cross-tasks.Image difference captioning,as a sub-task of image captioning,has attracted wide attention in the community and has great research value.This task aims at locating changed objects in similar image pairs and describing the difference with natural language.The key challenges of this task are to comprehend the context of image pairs sufficiently and locate the changed objects accurately in the presence of viewpoint change,thus generating a comprehensive and accurate difference caption.Previous studies focus on pixel-level image features,neglecting rich explicit features of objects in an image pair which are beneficial to generate a fine-grained difference caption.Additionally,existing generative models suffer from accurately locate the differences in the interference of viewpoint change.To address these issues,this thesis proposes an Instance-Level Fine-Grained Difference Captioning(IFDC)model,which consists of a fine-grained feature extraction module,a multi-round feature fusion module,a similarity-based difference finding module,and a difference captioning module.Firstly,to describe the changed objects in image pairs comprehensively,this thesis designs a finegrained feature extraction module,which extract the fine-grained features,i.e.,visual features,semantic features,and positional features at instance-level,as the objects' representation.Then,a multi-round feature fusion module is devised to fully fuse these multi-modality features of objects.Next,to enhance the model's immunity to viewpoint change,this thesis designs a similarity-based difference finding module.When locating the changed objects in image pairs,the objects are focused on instead of the viewpoint change.Thereby,the changed objects in image pairs are located accurately.Finally,this thesis utilizes a difference captioning module to generate the difference captions.In this thesis,extensive experiments,including comparison experiments,ablation experiments,visualization analysis,and case studies,have been conducted.The results show that the proposed IFDC model achieves comparable performance with the state-of-the-art models on the datasets of CLEVR-Change and Spot-the-Diff.These further demonstrate that using the instance-level finegrained feature extraction method to represent each object in image pairs,is conducive to effectively generating more comprehensive image difference captions.Moreover,fusing the fine-grained features of objects in multiple rounds is beneficial for fully integrating these multi-modality features.Furthermore,using the similarity-based difference finding method to locate the changed objects in image pairs can relieve the interference of viewpoint change to a certain extent,thereby facilitating the model to accurately locate the changed objects.These findings will provide an instance-level solution for image difference captioning task,and further advance its development.
Keywords/Search Tags:Image difference captioning, Change captioning, Instance-level, Fine-grained feature extraction, Similarity-based difference finding
PDF Full Text Request
Related items