Font Size: a A A

Action Recognition Based On Human Skeleton

Posted on:2021-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:X C LiFull Text:PDF
GTID:2428330611499420Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In visual surveillance systems,it is necessary to recognize the behavior of people handling objects such as a phone,a cup,or a plastic bag.Action recognition is the three-dimensional motion information of people.There are still many problems in quickly and accurately recognizing human action in real surveillance video.In order to solve this problem,this paper separately extracts human body and object features from the human skeleton,image convolution features,and video spatio-temporal convolution features,and uses convolutional neural networks to identify human actions.The research content of this paper is mainly divided into the following parts:In this project,the recognition of human action is based on the human skeleton data.I used an open-source algorithm,Open Pose2,to detect human skeleton(joint positions)from each video frame,and then utilized the skeleton as raw data to extract features and make classification by using machine learning algorithms.There are other methods for action recognition,such as using 3D Convolu tion Neural Network3 to directly recognize actions from video.However,it is time consuming and difficult to train the large neural network,and also lacks the interpretability.On the contrary,the features of human skeleton are concise,intuitive,and easy for differentiating different human actions.Thus,I chose the human skeleton as the base features to complete my action recognition project.To understand the visual world,a machine must not only recognize individual object instances but also how they interact.Humans are often at the center of such interactions and detecting human-object interactions is an important practical and scientific problem.In this paper,we address the task of detecting hhuman,verb,objecti triplets in challenging everyday photos.We propose a novel model that is driven by a human-centric approach.Our hypothesis is that the appearance of a person – their pose,clothing,action – is a powerful cue for localizing the objects they are interacting with.To exploit this cue,our model learns to predict an action-specific density over target object locations based on the appearance of a detected person.Our model also jointly learns to detect people and objects,and by fusing these predictions it efficiently infers interaction tr iplets in a clean,jointly trained end-to-end system.We validate our approach on the recently introduced Verbs in COCO(V-COCO)and HICO-DET datasets,where we show quantitatively compelling results.In this paper,to address this problem,we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses.In this framework,we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video,which include human joints with high confidence scores obtained in pose estimation.The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains,and these graphs are used as inputs to the graph convolutional networks.Through experiments over an open benchmark and our own data sets,we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.
Keywords/Search Tags:action recogniton, graph convolution, time convolution, attention mudule, human skeleton, object skeleton
PDF Full Text Request
Related items