Font Size: a A A

Research On Recognition And Detection Of Human-Object Interaction Behavior

Posted on:2021-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q LinFull Text:PDF
GTID:2518306032979849Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of computer vision,computer vision recognition tasks including object detection,motion recognition and segmentation have also made rapid progress.However,in order to accurately understand the real situation in the scene,the computer also needs to recognize the interaction between people and objects,which we call human-object interaction behavior detection and recognition.In most scenes and applications,humans are the interaction center of image semantics,so determining the interaction relationship between people and other objects in the image becomes the key to understanding image semantics in engineering applications.The task of human-object interaction behavior detection and recognition to be solved in this thesis is mainly divided into object detection and HOI relationship detection.The thesis conducts research and method improvement on the problems existing in the current human-object interactive recognition detection model,and explains the main work of the thesis from the following two parts:(1)Because the task model requires a large number of data images in order to ensure the accuracy of the model,and usually these data images vary greatly in size,which brings certain difficulties to the model detection and recognition.Inspired by the SNIP algorithm,this thesis improves the algorithm on the basis of the Faster R-CNN algorithm,and uses a three-branch parallel object detection network model to generate scale feature maps with unified representation capabilities.First,modify the structure of a specific convolutional layer,replace the single branch convolutional layer of the original feature extraction network with three branches,and expand the convolutional layer with different expansion parameters to build a parallel multi-branch architecture,each of which Branches share the same parameters,but have different receptive fields.Then,use three branches with different expansion parameters to specify scale filtering training to detect targets of different scales.This method does not introduce additional parameters and does not add additional calculations,and it has a better detection effect for target objects with large scale changes in the data set.(2)The current HOI relationship detection model directly uses the human body image as the network input,and then detects the interaction relationship between the person and the object,but for those action categories that do not involve any interactive objects and have a distant relationship with the person,and are far away For objects,the detection effect is not ideal.In order to solve the limitations in the HOI relationship detection model,considering the information hints contained in the appearance of people or objects in the image can promote interactive behavior detection,a three-stream network is used in the HOI relationship detection model,and different streams are extracted from different sources Features,and each stream branch uses a context convolution module(CCM)to capture the contextual information related to the interaction between people and distant objects from the input feature map,and incorporates an attention mechanism in the three-stream network The module dynamically generates an attention map for each detected person or object instance,highlighting the area relevant to the task.The experimental results show that the average accuracy of the model used in this thesis reaches 45.2%,which is 0.5%higher than the iCAN model with the best detection performance(44.7%),and it achieves the best on both the complete default object and the common default object.Compared with the iCAN model,the performance is improved by 0.72%and 1.08%,respectively.
Keywords/Search Tags:Human-object interaction, Three-branch object detection network, HOI relationship detection, Three-stream network, Context convolution module
PDF Full Text Request
Related items