Scene Graph Generation For Image Semantic Understanding And Represention | | Posted on:2022-10-28 | Degree:Doctor | Type:Dissertation | | Country:China | Candidate:H Zhou | Full Text:PDF | | GTID:1528307169476534 | Subject:Control Science and Engineering | | Abstract/Summary: | PDF Full Text Request | | With the growth of visual data,the understanding of digital images has gradually played an increasingly important role in the new generation of artificial intelligence.The semantic understanding and representation of image can accelerate the transformation from surface perception to deep cognition for visual scenes.As one of the hot and emerging topics in the field of computer vision,scene graph generation(SGG)establishes a structured representation between multiple objects for images by exploring the relationships between objects.Thus,SGG can extract the semantic information for images and lay a solid foundation for visual perception and reasoning tasks.This dissertation systematically studies the object-level image semantic segmentation,the relationship-level scene graph generation,and the attribute-level object attribute learning in the scene graph,by using several methods of statistical theory and causal reasoning.The main contributions and innovations of this dissertation are as follows.1.Motivated by the traditional edge detection operators and conditional random fields,we propose an image semantic segmentation method based on difference pooling and double pyramid module for alleviating edge blur and missing parts,which enhances the edge gradient features and establishes the long-distance dependence between pixels.Based on the Robert operator,a difference pooling module is designed to extract the edge gradient features to obtain finer object boundary segmentation results.To strengthen the connection between pixels and improve the integrity of objects,a double pyramid module is proposed to form a sparse fully connected conditional random fields.Besides,we also design a weakly supervised image semantic segmentation algorithm that uses object bounding box annotations instead of pixel-level labeling.The accuracy of the segmentation mask in training phrase has been improved by random label transformation algorithm.The experimental results show the superiority of our model in both object boundary and parts.2.Two scene graph generation methods for mining sparse relationships are proposed to avoid the high computational complexity and feature redundancy in dense graphs.By designing candidate relationships proposal mechanism,the time of model training and reasoning is reduced and the efficiency of message passing is improved.We first propose a novel relationship-aware primal-dual graph attention network to prune the dense graph.A trainable relationship distance measure network is designed to construct the sparse graph.It can preserve the contextual cues and neighboring dependency for objects and relationships from the interaction between primal and dual graphs.Beside,we propose a knowledge-embedding sparse graph attention network to improve the robustness and accuracy of the sparse graph,which can adaptively generate more reasonable sparse structures in different scenes.Meanwhile,a novel feature aggregation and update method via graphical message passing is adopted to learn the sparse knowledge graph by introducing the prior statistical probabilities.Experimental results show that both two methods can significantly increase the speed of model inference and improve the performance of SGG.3.To analyze the bias of the context in SGG,we propose a novel debiased SGG method for dual imbalance learning by causal graph,which can learn unbiased relationship features and improve the performance of rare relationships.In practice,the SGG datasets are often dual imbalanced.Existing methods ignore the backgroundforeground imbalance in SGG,which results in a biased model.We first deeply analyze the potential causes of dual imbalanced problem in SGG.Then,a causal graph of content and context is designed to remove the context bias and learn unbiased relationship features via casual intervention tree.Meanwhile,to learn more discriminate representation of the foreground by expanding the foreground features space,the biased resistance loss decouples the background classification from foreground relationship recognition.Extensive experimental results demonstrate our model outperforms other state-of-the-art methods,especially in the performance of rare relationship categories.4.Different from the previous models that may establish wrong causal connections between object categories and relationships,we design the decomposition of object features and propose a robust scene graph generation model based on counterfactual causal mining,which strengthens the essential content of relationship predictions and reduces the side effects of the class-generic features.The relationships should be the semantic reflection of the interaction between objects,rather than the statistical dependence between object categories.We firstly decompose object features into class-generic components and object-specific components,and deeply analyze the wrong causality in the current SGG framework through causal graph.Then,a counterfactual training model is proposed to enhance the influence of object-specific features and generate scene graph from the essence of relationship.Besides,we design a multi-hierarchy debiased loss for classifier learning,which can learn the hierarchical classification model from coarse-grained to fine-grained semantics.Experimental results show our proposed method has better performance in all metrics,and the robustness and generalization of the model have been greatly improved.5.To expand the object attribute prediction in the scene graph,we propose a multi-label learning object attribute classification method in SGG.Moreover,we have successfully applied the object features,attribute features and relationship features in the scene graph to the visual question answering task to verify the effectiveness of our proposed methods.To deal with the challenge that the objects in the image may have one or more attribute labels,we regard the object attribute prediction as a multi-label classification task,and design a multi-classifier model to accurately predict both the single attribute labels and the whole attribute labels.In addition,combined with the features of object,relationship and attribute from the scene graph,we successfully realize the application of multiple semantic features in visual question answering task. | | Keywords/Search Tags: | Image Semantic Understanding and Represention, Scene Graph Generation, Relationship Prediction, Image Semantic Segmentation, Object Attributes, Visual Question Answering, Weakly Supervised Learning, Sparse Graph, Causal Intervention | PDF Full Text Request | Related items |
| |
|