Font Size: a A A

Research On Semantic Description Method Of Image Changes Based On Representation Interaction And Cycle Consistenc

Posted on:2024-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:S B YueFull Text:PDF
GTID:2568307112952069Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Humans can perceive changes in a dynamic environment and convey relevant information about them,but this is arduous for computer systems.Recent advances attempt to use a linguistic sentence to describe the detailed semantic changes between an image pair which is defined as change captioning.It is a novel branch of cross-media intelligence.Compared to conventional image captioning change captioning is more complex and challenging because it requires not only understanding the visual content of each image but also further determining and seeking the disagreement between the image pairs and generating a sentence to summarize them.It has widespread applications such as fault detection video surveillance medical diagnosis and multimedia human-computer interaction.The existing works are mainly based on the neural encoder-decoder framework.Despite significant progress that has been achieved,there are still three limitations in 1)the fine-grained interaction of intra-image in pairs;2)the difference representation learning of inter-images in the distraction of irrelevant change.3)semantic association between the visual features and the visual words.To address the above issues,the main contributions are as follows.(1)This dissertation proposes an intra-and inter-representation interaction network for change captioning.In a dynamic world,the model needs to be equipped to go beyond general visual perception and describe the actual change from different perspectives.Previous approaches lack the understanding of object relations within images and fine-grained matching mechanisms of inter-image objects,resulting in learning a false difference representation.To tackle this problem,this paper proposes an intra-and inter-representation interaction network to learn a reliable difference representation in the distraction of irrelevant change.As its core: the intra-representation interaction network models the comprehensive semantic-positional interactions of intraimage,which not only helps the model to explore fine-grained semantic change,but can also be considered as a priori knowledge of dealing with viewpoint change;The interrepresentation interaction network matches multi-level correspondence(feature,positioninteraction,semantic-interaction)of inter-image from coarse to fine to learn the semantic change.The proposed approach outperforms the state-of-the-art methods with an encouraging performance on the existing change captioning benchmarks,CLEVRChange,CLEVR-DC and Spot-the-Diff.The experimental results validate that intra-and inter-representation interaction mechanisms can effectively construct the reliable difference representation.(2)This dissertation proposes representation interaction and cycle consistency network for change captioning.The well-aligned visual-textual semantic association is crucial to generate accurate captions.Existing approaches employ only attention mechanisms to align interaction between visual and textual features,which is insufficient to correspond change features to their target words.Therefore,based on the above intraand inter-representation interaction model,this paper further proposes representation interaction and cycle consistency network.The model introduces an extra semantic consistency of the visual-word-visual cycle to correct the semantic alignment between visual and text features through a gated cyclic mechanism.Experiments on CLEVRChange,CLEVR-DC and Spot-the-Diff datasets show that the representation interaction and cyclic consistency model outperforms the intra-and inter-representation interaction model on all evaluation metrics.The experimental results indicate that the proposed method calibrates the semantic consistency between difference representation learning and language generation,and aligns the association between visual and textual features,thus further improving the quality of the generated sentences.(3)This dissertation proposes Transformer-based models for change captioning.The Transformer framework has made significant progress in natural language processing and computer vision,and to demonstrate the generalization ability of the proposed approach in different frameworks.this paper proposes two Transformerbased models of the intra-and inter-representation interaction network,and the representation interaction and cycle consistency network.Specifically,the authors replace the recurrent decoding structure with the full-attentive multi-layer decoding structure.Experiments on CLEVR-Change,CLEVR-DC and Spot-the-Diff datasets show the performance of the Transformer-based models as well as the original two models.The experimental results indicate that the proposed representation interaction and cycle consistency network have good generalization ability in the Transformer framework.The proposed approach is more competitive than the state-of-the-art methods.(4)This dissertation designs and implements a system for the automatic generation of semantic change.Based on the proposed representation interaction and cycle consistency algorithm,this paper develops an application system by adopting the Sanic and Android framework in the form of separate front and back ends.The system receives images from realistic environments by providing two types of image input,namely local uploading and camera capturing,and automatically generates captions about the change.The developed system confirms the possibility of implementing this technology in real-life scenarios.
Keywords/Search Tags:Change captioning, Intra-representation interaction, Inter-representation interaction, Cycle consistency, Transformer
PDF Full Text Request
Related items