Font Size: a A A

Skeleton-based Human Action Recognition Via Adaptively Harvesting Multi-modal Information

Posted on:2021-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:J M CaiFull Text:PDF
GTID:2518306098999799Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Human action recognition is an active yet challenging task in the field of computer vision.Skeleton-based action recognition has attracted research attentions in recent years.Human skeleton motion sequences retain useful high-level motion signal by eliminating the redundant information in the RGB video clips.Compared with the original RGB video clip,a skeleton sequence,with the human body joints in the form of 2D or 3D coordinates,is sparse.Thus,neural networks designed for skeleton-based action recognition can be lightweight and efficient.The sparse human skeleton sequence does not involve the information of the characters' appearance and scenes in the video,in terms of user privacy,skeleton-based action recognition has better security.One common drawback in currently popular skeleton-based human action recognition methods is that the sparse skeleton information alone is not sufficient to fully characterize human motion.First,the human body pose sequence is very sparse,it is difficult to accurately describe the local subtle motion of various parts of the human movement.This makes some existing skeleton-based action recognition methods unable to accurately recognize some action categories mainly characterized by local subtle movements.In addition,in the action category where the global motion is relatively large,when the skeleton of human actions are very similar,due to the judgmental information provided by the skeleton sequence is insufficient,these actions with similar human skeleton are also easily confused by the neural networks.In this paper,to move beyond the abovementioned limitations,we propose a novel scheme for human action recognition by augmenting the light-weight visual information aligned with skeletal joints in a graph convolutional network(GCN)-based framework.Specifically,we use Joint-aligned optical Flow Patches(JFP)to capture the local subtle motion around each joint.Compared to the pure skeleton-based baseline,this hybrid scheme effectively boosts performance,while keeping the computational and memory overheads low.Experiments on the NTU RGB+D dataset,NTU RGB+D 120 dataset and the Kinetics-Skeleton dataset demonstrate clear accuracy improvements attained by the proposed method over the state-of-the-art skeleton-based methods.
Keywords/Search Tags:Human action recognition, human skeletons, GCN, optical flow
PDF Full Text Request
Related items