Font Size: a A A

Research On Action Recognition In Video-Skeleton Sequences Based On Deep Learning

Posted on:2020-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2428330590474645Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Human action is a direct and effective way in the vision-based human-robot interaction.However,human action is a complex three-dimensional signal,and it is still difficult to get efficient and stable recognition in complex scenes.Aiming at the action recognition problem,this paper extracts the action spatiotemporal features from videos,human skeleton sequences and the fusion of them,then uses convolutional neural networks to identify the classification.In general,the research content of this paper mainly includes the following aspects:Video based Two-Stream CNNs for action recognition algorithm.Aiming at the problem of slow calculation of dense optical flow in existing Two-Stream CNNs,an endto-end model is proposed in the process of training and recognition.It contains two streams,spatial and global temporal stream,to characterize and recognize action.Based on the MobileNetV2,the spatial stream learns features from action images,the global temporal stream learns features from the Energy Motion History Images(EMHI),and then fuse them.Finally a multi-frame fusion method is used to improve the accuracy.Skeleton based action recognition algorithm with convolutional neural networks.The CNNs based on video is less robust to scene changes and and it can not recognize at night.A real-time action recognition system based on skeleton sequence is proposed.Perform a view invariant transformation on the human skeleton sequence to eliminate the influence of the viewpoint.Then the sequence is encoded into the RGB space with the original spatial structure information and temporal dynamic information.Finally,a lightweight CNNs is designed to identify the encoded RGB image.Multi-data based temporal action detection algorithm.Innovatively transforms the temporal action detection(TAD)problem into the one-dimensional object detection problem.A Two-Stream network based on YOLO is proposed.The input of the network combines the video and skeleton sequence from Kinect.In the video stream,the C3 D feature extractor is used to extract high-dimensional features of short-term video.In the skeleton stream,view invariant transformation is performed on the skeleton sequence.The high-dimensional features of the two streams are encoded as input to the Two-Stream object detection networks,and finally two methods are designed to fuse them.
Keywords/Search Tags:Action Recognition, Convolutional Neural Networks, Temporal Action Detection, Object Detection
PDF Full Text Request
Related items