Research On Efficient Action Recognition Based On Deep Learning

Posted on:2023-10-27

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z Huang

Full Text:PDF

GTID:1528306902459094

Subject:Information and Communication Engineering

Abstract/Summary:

Video is becoming one of the most important data modalities in the era of big data.Compared to static images,videos are more complex because they carry more motion and auditory information.Thus,it is more challenging to apply video-based tasks,such as video processing,video classification and video retrieval.The human action recognition is one of the most important topics in video understanding.It can be applied to virtual reality,human-computer interaction,intelligent monitoring and video retrieval,and provide technical support for online short videos,live broadcast platforms,and video websites.The human body is the most important subject in videos.The machines cannot understand videos until they are able to understand the patterns of human action.However,previous methods for human action recognition ignore the natural structure of human body,leading to poor representation ability of spatio-temporal features.On the other hand,directly increasing the number of parameters of models to improve performance will inevitably introduce higher computational complexity and make the models hard to train.In order to solve the above two problems,this thesis starts from increasing the ability for the human body structure modeling and model acceleration for different modalities of video data.The thesis proposes the 3D local convolutional neural networks,the spatio-temporal inception graph convolutional networks and the inheritance and exploration knowledge distillation to simultaneously increase the representation ability of features and reduce the computational cost.The contributions and novelty can be summarized as follows.(1)To better model the features of human body parts in the video-based action recognition,this thesis introduces the 3D local convolutional neural networks.This thesis introduces 3D local operations as a generic family of building blocks for extracting 3D local features from adaptive 3D local neighborhood.The proposed 3D local operations support the extraction of local 3D volumes of body parts in a sequence with adaptive spatial and temporal scales,locations and lengths.In this way,the spatiotemporal patterns of the body parts are well learned from the 3D local neighborhood in part-specific scales,locations,frequencies and lengths.Based on this novel 3D local operation,this thesis proposes a simple but effective 3d local convolutional network for action recognition.The network can be combined with any existing architecture to explicitly extract the distinguished motion features of different human body parts and improve the representation ability of spatiotemporal features.(2)To address the problem of modeling the graph structure of skeleton sequence in the skeleton-based action recognition,this thesis introduces the spatio-temporal graph convolutional neural networks.By exploring the graph structures of joints in the skeleton sequence,this thesis proposes to extract and merge multi-scale graph features in the graph networks.To implement this idea,this thesis proposes to a simple but highly modulized graph convolutional neural networks for skeleton-based action recognition.The proposed approach overcomes the limitations of previous methods by extracting and synthesizing different scale and transformation information from different paths at different layers.Apart from that,it also aggregates multi-scale information from spatial and temporal dimensions.Extensive experiments show that its performance surpass the previous state-of-the-art methods on multiple datasets,with greatly reduced the amount of parameters and computational cost.(3)In order to further reduce the computational and training cost,this thesis introduces a model distillation algorithm to improve the computational efficiency while keeping the high performance of neural networks.Directly pushing the student model to mimic the probabilities/features of the teacher model to a large extent limits the student model in learning undiscovered knowledge/features.To address this issue,this thesis proposes a novel inheritance and exploration model distillation framework,in which a student model is split into two parts-inheritance and exploration.The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from the teacher model to the student model,while the exploration part is encouraged to learn representations different from the inherited ones with a dis-similarity loss.The proposed framework is generic and can be easily combined with existing distillation or knowledge transferring methods for training deep neural networks.Extensive experiments demonstrate that these two parts can jointly push the student model to learn more diversified and effective representations.

Keywords/Search Tags:

Deep Learning, Action Recognition, Convolutional Neural Network, Graph Convolutional Neural Network, Model Distillation

Related items

1	Research On Human Action Recognition Based On Graph Convolutional Neural Networks
2	Design And Implementation Of Human Action Recognition System Based On Graph Convolutional Neural Network
3	Behavior Recognition Based On Graph Convolutional Neural Network
4	Research On Action Recognition Based On 3D Convolutional Neural Network
5	Research On Human Action Recognition Based On Graph Convolutional Neural Networks
6	Action Recognition Method Based On Sparse Auto-Combination Spatio-Temporal Convolutional Neural Network And Its MapReduce Implementation
7	Human Skeletal Action Recognition Based On Deep Learning
8	Study On Action Recognition Algorithm In Monitoring Environment Based On Convolutional Neural Network
9	Research And Application Of Gabor Convolutional Neural Network Model
10	The Research On Video Action Recognition Based On Lightweight 3D Convolutional Neural Network