Research On Visual Object Detection And Relationship Understanding

Posted on:2022-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:Z J o y a C h e n Chen

Full Text:PDF

GTID:2518306323462474

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

For a long time,visual object detection and understanding have been the core issue in artificial intelligence,which are also greatly demanded in the industry.Visual object detection mainly answers two questions of "where" and "what" of visual objects,that is,to locate the visual objects by bounding-boxes,and to classify the categories of objects.With accurate visual object detection results,visual object relationship understanding is to reveal the semantic relationship between visual objects.These relationships usually refer to the predicate verbs between visual objects,which form the subject-predicate-object triplet of<Visual Object A-Visual Relationship-Visual Object B>.This dissertation will focus on the above two problems.The former is the premise of the latter,and the latter is the road to high-level visual object understanding.In such a technical route,our research contents and contributions are listed as follows:(1)For visual object detection,this dissertation focuses on how to accurately iden-tify visual objects in a visual scene.To solve the imbalance between foreground and background samples in the visual scene,we propose a Sampling-Free mechanism that solves the imbalance problem by optimal bias initialization and adaptive guided loss,which avoids the laborious resampling and reweighting strategies.Experiments on mul-tiple datasets show that the Sampling-Free mechanism accelerates the model training,and effectively improves the accuracy of multiple visual object detection algorithms.(2)After accurately detecting visual objects,this dissertation further studies the semantic relationship understanding between visual objects.We creatively aim at the problem of visual relationship expression in a dynamic visual scene,and propose a video content-oriented visual-text relationship alignment method,CrossGraphAlign,for video content retrieval.In this method,the text and video are expressed as the text relation graph and multiple visual relation graphs respectively,with an attention mechanism to match these graphs,which makes it possible to retrieve specific video segments by us-ing text relationships.Experiments on several datasets show that the CrossGraphAlign method could effectively align visual relationships and text relationships,as well as greatly improve the recall of video content retrieval system.

Keywords/Search Tags:

Visual Object Detection, Visual Relationship Understanding, Deep Learning, Foreground-background Imbalance, Relationship Graph

PDF Full Text Request

Related items

1	Research On Inter-and Intra-Image Visual Relationship Understanding
2	Research On Visual Relationship Detection Based On Deep Learning
3	Research On Visual Relationship Detection Based On Deep Learning
4	Visual Relationship Generation Based On Scene Understanding
5	Visual Relationship Detection Based On Deep Learning
6	The Research And Implementation Of Visual Description Platform On Visual Federation Relationship
7	Research On Visual Relationship Detection In Natural Scene
8	Research On Multimodal Information Oriented Relationship Detection
9	Computer Vision Object Relationship Detection Based On Deep Learning
10	Research On Image Caption Algorithm Based On Visual Relationship