Understanding human behavior is the foundation of service provide intelligent service robot,human behavior is not only relevant to human action with the human body the current environment and the interaction action object,to achieve the purpose of behavior understanding will be combined with the human body movements and interactive objects,namely human-object interaction detection.In recent years,the research on object detection has made a great breakthrough,while the research on motion detection is mostly carried out in a simple laboratory environment.These studies only predict the classification of human movements,and there are few studies on the combination of human movements and interactive objects for behavior understanding.However,simple combination of existing target detection methods and human motion detection methods is unable to realize human-object interaction detection in the real environment.Therefore,this paper studies human-object interaction detection in the complex environment.In consideration of the situation that human body simultaneously carries out a variety of actions in the real environment,a human-object interaction detection method that can effectively identify multiple types of actions and accurately identify the objects the actions interact with in the complex environment is proposed.Because the image of the classification and the location of the people and objects are detected accurately is the base of human-object interaction detection,so this article first study of target detection algorihm,design a fast object detection based on attention mechanism model,the model for two phase target detection model,the first to use attention mechanism generates heat to distinguish the foreground and foreground figure,and by using the heat map of anchor box for rapid screening of candidate area,and then classify the candidate region and border regression.Finally,the model uses the improved maximum suppression value algorithm to delete and merge the overlapping frames,making the objects contained in the frames more complete,which is conducive to the extraction of spatial features in human-object interaction detection.In this paper.the model was used to carry out experiments on COCO data sets.The average accuracy of the model was 38.9%,slightly lower than that of the Faster-RCNN model,but the detection time was reduced by about 24%compared with that of the Faster-RCNN model,and the real-time performance was better.To solve the problem of human-object interaction detection in complex environment,this paper proposes a human-object interaction detection model based on semantic information.In order to solve the problem of multi-category action recognition,this model pairs all people and objects and conducts human-object interaction detection for each human-object pair successively.Moreover,it utilizes semantic features combined with visual features and spatial features to predict human actions in the process of human-object interaction detection innovatively,and predicts the human-object interaction score of the human-object pair,and finally realizes human-object interaction detection.This paper uses this model to conduct experiments on V-Coco data set,and the experimental results show that this model is more excellent than the human-object interaction detection model based only on visual features,and can effectively identify multiple actions of the same subject.In the next,human-object interaction detection is further studied in this paper.,using figure convolution to deal with people and objects of visual characteristics,extract context information between people and objects,will human-object interaction detection based on figure convolution model experiment on V-COCO data set,the experimental results show that with the people and material interaction detection model based on semantic branch,compared with a careful integral and semantic all branches have their own advantages on different motion detection.Then,this paper integrates the two models and proposes a human-object interaction detection model based on graph convolution and semantic information.The experimental results of the model on V-Coco data set show that this model effectively combines the advantages of the two models and achieves better human-object interaction detection results.Finally,this paper describes the results of human-object interaction detection in the form of "human-action-object" triad to complete a simple understanding of human behavior based on human-object interaction detection. |