Font Size: a A A

Visual Relationship Detection Based On Deep Learning

Posted on:2022-01-15Degree:MasterType:Thesis
Country:ChinaCandidate:W T WangFull Text:PDF
GTID:2518306740982739Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Visual relationship detection aims to provide a comprehensive understanding of an image by describing all the objects within the scene,and how they relate to each other,in <objectpredicate-object> form.Visual relationship detection is useful for a wide range of image understanding tasks,such as captioning,retrieval,reasoning,and visual question answering.There is a long tail distribution in the visual relationship detection dataset.For visual relationship detection,the imbalance in the number of predicates means that some predicates between object pairs will have very few training samples.Therefore,it is still a difficult problem to identify different predicates accurately.Most of the models only have good performance for high-frequency predicates,but the accuracy of low-frequency predicates is low,which affects the application of visual relationship detection.The paper focuses on visual relationship detection under long-tail distribution.Compared with other computer vision tasks with long-tail distribution,visual relationship detection has different characteristics.Visual relationship detection is a multi-modal task.Accurate recognition of predicate requires not only visual information and spatial information from the image but also semantic information,which makes the model need to be able to accurately extract features in different modes and fuse them.It is found that there are two phenomena of the nonstandard label and feature overlap in the dataset of visual relationship detection task.The VG dataset has no uniform annotation specification,which makes the dataset have certain noise.Due to the lack of training samples of low-frequency predicates,the model is difficult to extract the discriminative features of low-frequency predicates.These aggravate the influence of long-tail distribution on the performance of the visual relationship detection model.In the paper,we use two ways to reduce the impact of long-tail distribution on the visual relationship detection.On the one hand,a one-shot learning task for visual relationship detection is proposed to solve the problem of lacking training samples in the visual relationship dataset.A feature-level attention mechanism network is used to solve the problem of feature sparsity,and a dual graph module is used to enhance the intra-class similarity and improve the inter-class difference to achieve one-shot learning.On the other hand,a visual relationship detection network under unbalanced classes is proposed,which can improve the detection performance of low-frequency predicates without reducing the detection effect of highfrequency predicates.We use the method of memory feature to improve the performance of detection by enhancing the feature of low-frequency relations.Experimental results on several datasets show that the proposed network has excellent performance.
Keywords/Search Tags:Visual Relationship Detection, Long-Tailed, Deep Learning, Class-Imbalanced
PDF Full Text Request
Related items