Font Size: a A A

Research On Visual Relationship Detection Based On Deep Learning

Posted on:2024-09-02Degree:MasterType:Thesis
Country:ChinaCandidate:M ChenFull Text:PDF
GTID:2568307079970849Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Visual relation detection provides advanced scene understanding ability for image and other visual modalities,and can provide rich and high-level semantic representation in addition to visual features for many downstream tasks(such as Visual Captioning,Visual Question Answering,Image Generation,etc.).It has a wide range of applications in computer vision,and has been regarded as one of the key tasks in the field of computer vision.Scene Graph Generation and Human-Object Interaction Detection are two basic tasks of visual relationship detection.The challenge is how to obtain more accurate relationship representation.This thesis proposes two novel methods to improve the performances for the two tasks of visual relation detection,respectively.For Scene Graph Generation task,this thesis proposes a Multi-Scale Graph Attention Network.The research on scene graph generation algorithms,from the early learning of predicate accurate semantic embedding to the recent introduction of context information to enrich the representation of instances and relations,these methods have achieved certain results in the experiment.However,they ignore the salient instances in the image and the relationships between them to generate the target scene graph.To overcome their limitations,this thesis proposes an innovative multi-scale feature aggregation framework.Firstly,through the encoder-decoder structure on the instance graph,the salient instances in the image are obtained and the features of the instances are enhanced.Then the important relationships between the instances are inferred.The whole framework consists of two key sub-modules,a multi-scale message passing module,and a relational filtering module.Extensive experiments prove that the Multi-Scale Graph Attention Network proposed in this thesis can effectively integrate and learn rich context information to improve the recognition accuracy of scene graph.For Human-Object Interaction Detection task,this thesis proposes a feature-decouped two-stream attention network.Previous research on the human-object interaction detection task has explored many effective methods to improve the performance of humanobject interaction detection.However most of them have introduced additional knowledge to aid interactive prediction from the perspective of information flow.This thesis proposes a novel decoupling method of interest features in image regions.By analyzing and utilizing the existing features more deeply,it can complete high-level feature aggregation with a dual-stream attention network,and improve the interactive recognition ability of the model without introducing additional knowledge.The model consists of a feature decoupling module and a two-stream attention network module.A large number of experiments and visualizations show that the proposed method can significantly improve the performance of the human interaction detection task.
Keywords/Search Tags:Visual Relation Detection, Scene Graph Generation, Human-Object Interaction Detection, Multi-Scale Framework, Attention Mechanism
PDF Full Text Request
Related items