Font Size: a A A

Human-Object Interaction Detection Based On Pose Estimation

Posted on:2021-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:W LiuFull Text:PDF
GTID:2518306050470874Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In order to understand the visual world more deeply,the computer not only needs to detect the individual object in the complex scene,but also infer the relationship between the objects.Among all kinds of visual relationships,we focus on the relationship between human and other objects in the scene.Human-object interaction(HOI)detection is a computer vision task used to judge the interaction between human and objects,which has great practical value and potential.HOI detection is usually understood as the detection of triples <human,verb,object>.The interactions between candidate human-object pairs are classified on the basis of detecting human and objects.Because there are many kinds of objects,and each kind is related to a variety of interactions,the HOI detection task is very difficult.The classic HOI detection method uses a multi-stream structure,including human stream,object stream and spatial stream.The three streams deal with the appearance features of human and object or the spatial relationship between human and object respectively.In addition,context,knowledge and human pose information are added to other extended algorithms.Although the detection effect of these methods has been improved to some extent,they all treat human as a whole without considering the influence of different regions of human body on interaction.In fact,the human body is composed of many body parts.When human interact with objects,the state of these body parts will change at the same time and have different effects on the interaction.Compared with the coarse-level visual features of human,the visual features of different body parts and the potential association with objects can provide more precise clues for HOI detection.To solve these problems,this paper combines two-stage strategy with multi-level human features,and proposes a two-stage strategy-based multi-level framework for human-object interaction detection.This framework consists of shared feature extraction module,interactive judgment module and multi-level interactive detection module.The interactive judgment module uses three-stream structure,utilizing the visual features of human and object and their spatial relations to judge whether the interaction exists,and suppresses the non-interactive human-object pairs.Multi-level interactive detection module is used to determine specific interaction categories.This module adds the fine-level body part stream on the basis of three streams.This stream combines the fine-level features of different body parts and the distance relationship between body parts and objects,so as to complement the human stream using the coarse-level human features and assist the interaction detection together.Considering the importance of different body parts for interaction,we further introduce the concept of part attention mechanism.According to the spatial relationship between human and object and the positional relationship between body parts,the part attention module obtains the attention scores,which are used to reflect the relative importance of each body part.Applying the part attention mechanism to the fine-level body part stream can enhance the influence of important body parts,so it is easier to distinguish different interactions.In addition,we also improve the fusion method of prediction scores of multi-level attention interactive detection module,add fusion coefficient and mix fusion strategy according to the characteristics of interactions.The experimental results show that compared with the existing methods,the proposed method can effectively reduce the probability of missed detection and false detection of interactions and obtain higher average precision.
Keywords/Search Tags:Human-object interaction, Part attention mechanism, Two-stage strategy, Multi-level human features
PDF Full Text Request
Related items