Visual Grounding Via Accumulated Attention

Posted on:2020-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:C R Deng

Full Text:PDF

GTID:2428330611466000

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Visual Grounding(VG) aims to locate the most relevant object or region in an image,based on a natural language query.Generally,it requires the machine to first understand the query,identify the key concepts in the image,and then locate the target object by specifying its bounding box.However,in many real-world visual grounding applications,we have to face with ambiguous queries and images with complicated scene structures.Identifying the target based on highly redundant and correlated information can be very challenging,and often leading to unsatisfactory performance.To tackle this,in this paper,we exploit an attention module for each kind of information to reduce the internal redundancies.We then propose an accumulated attention(A-ATT)mechanism to reason among all the attention modules jointly.In this way,the correlations among different kinds of information can be explicitly captured.Moreover,to improve the performance and robustness of our VG models,we additionally introduce some noises into the training procedure to bridge the distribution gap between the human-labeled training data and the realworld poor quality data.With this �noised� training strategy,we can further learn a bounding box regressor,which can be used to refine the bounding box of the target object.We evaluate the proposed methods on four popular datasets(namely Refer COCO,Refer COCO+,Refer COCOg,and Guesswhat?!).The experimental results show that our methods significantly outperform all previous works on every dataset in terms of both speed and accuracy.

Keywords/Search Tags:

Visual Grounding, Accumulated Attention, Bounding box regression, Noised training strategy

PDF Full Text Request

Related items

1	Scale Adaptive Visual Object Tracking Based On Bounding Box Regression
2	Looking on the bright side: The effect of a positive visual attention training intervention on attention and emotion regulation
3	Research On Visual Dialog Technology Integrating Dialog History
4	Visual Grounding Based On Deep Learning
5	Research On Cross-model Speech Recognition System
6	Multimodal Fine-grained Interaction Modeling For Textual Video Grounding
7	Researches About Visual Attention: Algorithm Design, And System Implementation
8	Probability Graph Model Based Visual Attention Mechanism And Its Application
9	Large-scale Visual Relationship Detection Based On Hierarchical Training Strategy
10	Deep-learning-based Face Detection And Segmentation Using Iterative Bounding-box Regression