Font Size: a A A

Scence Graph Generation Based On Context

Posted on:2021-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:X LinFull Text:PDF
GTID:2428330605974777Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scene graph generation task is a new emerging task of computer vision recently.As an ion of the complex relationships between objects in an image,scene graphs can provide the structured information and be widely applied in high-level visual tasks,such as image caption,visual question answering,image retrieval and image generation.The context refers to the interactive information in the scene.This paper focuses on the modeling of the context in scene graph from three aspects:local context generation,global context generation and multi-level global context fusion.The main research works are as follows:(1)In order to solve problem of visual diversity representation of visual relationship and enhance the information of the original visual features at the same time,in view of the fact that the local context can further abstract the information of the objects or local area in the scene and enhance the scene perception and object recognition in visual understanding,this paper proposes a local context-based scene graph generation method(Node-relation).Our proposed method uses the attention mechanism to change the traditional messaging passing method.In the node message pooling,attention weight is applied to the in-and-outbound edge(subject-predicate and object-predicate hidden vector).In the edge message pooling,dynamic attention maps of the obtained visual feature map of subject and object and a weighted sum of subject and object information is obtained to form the messgage of local context in predicate.Experiments on the Visual Genome data set show that the Node-relation method inproves the visual representation of the object,because the local context information is better used to effectively distinguishes the objects with similar appearance,so that the generated scene graph is more accurate.(2)In order to improve the effect of scene graph generation on data set bias,based on the global context information can better express the global topic informantion in visual scenes for joint reasoning and the mechanism that the human brain constructs a scene graph using the global context information,this paper proposes a scene graph generation method(Residual Shuffle Sequence,RSSQ)based on the shuffle residual global context.This method focuses on a bi-directional LSTM(biLSTM)architecture with the residual shuffle module.In the residual shuffle module,the random shuffle operation alleviates the effect of data set bias on scene graph generation,and the residual connection enables to achieve the global edge context sharing between different layers of biLSTM.In addition,the relative position context information is supplemented by explicit position embedding.The experiments show that the RSSQ method can generate a more ideal scene graph in the high and middle frequency bands of the VG data set.(3)In order to make full use of the complementary advantages of global context information from different aspects in the topic representation,simulating the multi-angle and multi-level comprehensive expression mechanism of human brain's understanding of the scene content,the paper proposes a global context-based scene graph generation(Module-att)method based on the module attention fusion.By designing a low-cost modular attention network,this method obtains the fusion global context based on the chain-structured context with the residual shuffle biLSTM,the full connection context with multi-head self-attention,and the relative position context.Ablation experiments and comparative experiments show that the Module-att method can fuse different visual scene context information,and improve the representation of visual relationship context and the bias of the data set,so the performance of scene graph generation on VG dataset is effectively improved.In short,based on the research of the extraction of local,global and multi-level global contexts,the performance of scene graph generation has made corresponding progress in the three aspects of visual diversity expression,data set bias,and multi-level topic information of scene structure.Simultaneously,the feasibility,reliability and superiority of the proposed method are presented on the VG dataset.
Keywords/Search Tags:Scene Graph Generation, Context, Visual Relationship, Deep Learning, Attention Mechanism
PDF Full Text Request
Related items