Research On Semantic Description Method Of Image Changes Based On Representation Interaction And Cycle Consistenc

Posted on:2024-05-22

Degree:Master

Type:Thesis

Country:China

Candidate:S B Yue

Full Text:PDF

GTID:2568307112952069

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Humans can perceive changes in a dynamic environment and convey relevant information about them,but this is arduous for computer systems.Recent advances attempt to use a linguistic sentence to describe the detailed semantic changes between an image pair which is defined as change captioning.It is a novel branch of cross-media intelligence.Compared to conventional image captioning change captioning is more complex and challenging because it requires not only understanding the visual content of each image but also further determining and seeking the disagreement between the image pairs and generating a sentence to summarize them.It has widespread applications such as fault detection video surveillance medical diagnosis and multimedia human-computer interaction.The existing works are mainly based on the neural encoder-decoder framework.Despite significant progress that has been achieved,there are still three limitations in 1)the fine-grained interaction of intra-image in pairs;2)the difference representation learning of inter-images in the distraction of irrelevant change.3)semantic association between the visual features and the visual words.To address the above issues,the main contributions are as follows.(1)This dissertation proposes an intra-and inter-representation interaction network for change captioning.In a dynamic world,the model needs to be equipped to go beyond general visual perception and describe the actual change from different perspectives.Previous approaches lack the understanding of object relations within images and fine-grained matching mechanisms of inter-image objects,resulting in learning a false difference representation.To tackle this problem,this paper proposes an intra-and inter-representation interaction network to learn a reliable difference representation in the distraction of irrelevant change.As its core: the intra-representation interaction network models the comprehensive semantic-positional interactions of intraimage,which not only helps the model to explore fine-grained semantic change,but can also be considered as a priori knowledge of dealing with viewpoint change;The interrepresentation interaction network matches multi-level correspondence(feature,positioninteraction,semantic-interaction)of inter-image from coarse to fine to learn the semantic change.The proposed approach outperforms the state-of-the-art methods with an encouraging performance on the existing change captioning benchmarks,CLEVRChange,CLEVR-DC and Spot-the-Diff.The experimental results validate that intra-and inter-representation interaction mechanisms can effectively construct the reliable difference representation.(2)This dissertation proposes representation interaction and cycle consistency network for change captioning.The well-aligned visual-textual semantic association is crucial to generate accurate captions.Existing approaches employ only attention mechanisms to align interaction between visual and textual features,which is insufficient to correspond change features to their target words.Therefore,based on the above intraand inter-representation interaction model,this paper further proposes representation interaction and cycle consistency network.The model introduces an extra semantic consistency of the visual-word-visual cycle to correct the semantic alignment between visual and text features through a gated cyclic mechanism.Experiments on CLEVRChange,CLEVR-DC and Spot-the-Diff datasets show that the representation interaction and cyclic consistency model outperforms the intra-and inter-representation interaction model on all evaluation metrics.The experimental results indicate that the proposed method calibrates the semantic consistency between difference representation learning and language generation,and aligns the association between visual and textual features,thus further improving the quality of the generated sentences.(3)This dissertation proposes Transformer-based models for change captioning.The Transformer framework has made significant progress in natural language processing and computer vision,and to demonstrate the generalization ability of the proposed approach in different frameworks.this paper proposes two Transformerbased models of the intra-and inter-representation interaction network,and the representation interaction and cycle consistency network.Specifically,the authors replace the recurrent decoding structure with the full-attentive multi-layer decoding structure.Experiments on CLEVR-Change,CLEVR-DC and Spot-the-Diff datasets show the performance of the Transformer-based models as well as the original two models.The experimental results indicate that the proposed representation interaction and cycle consistency network have good generalization ability in the Transformer framework.The proposed approach is more competitive than the state-of-the-art methods.(4)This dissertation designs and implements a system for the automatic generation of semantic change.Based on the proposed representation interaction and cycle consistency algorithm,this paper develops an application system by adopting the Sanic and Android framework in the form of separate front and back ends.The system receives images from realistic environments by providing two types of image input,namely local uploading and camera capturing,and automatically generates captions about the change.The developed system confirms the possibility of implementing this technology in real-life scenarios.

Keywords/Search Tags:

Change captioning, Intra-representation interaction, Inter-representation interaction, Cycle consistency, Transformer

PDF Full Text Request

Related items

1	Research On Feature Interaction Measurement And Interaction Form Recognition In Complex Models
2	Research On Automatic Text Classification Methods Based On Neural Interaction Representation Under The Hierarchical Structure
3	A Study Of Sparse Representation And Low-Rank Representation For Hyperspectral Band Selection
4	Sparse Representation Based Gesture Recognition And Multi-fingered Hand Interaction
5	Research On Inter-domain Controller Interaction Of SDN And Its Application
6	Research On Person Re-identification Algorithm Based On Local Information Interaction Enhancement And Inter-domain Fusion And Intra-domain Style Normalization
7	On The Cycle Structure Of The NFSR And The Construction Of The M-sequence
8	Image Difference Captioning With Instance-level Fine-grained Feature Representation
9	Des chiffres et des etres. Etude introductive a l'identification de la representation sociale de la statistique chez des etudiants de premier cycle en Sciences humaines et sociales en France
10	Research On Transformer-based Object Detection With Local And Global Interaction