Deep Action Recognition Using Cross-View Video Prediction At Edge

Posted on:2024-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:R X Zhang

Full Text:PDF

GTID:2568307172969839

Subject:Computer application technology

Abstract/Summary:

The combination of artificial intelligence and edge computing provides a new research direction for deep action recognition.In the novel new edge intelligence environment,the deep action recognition model is directly deployed on the terminal device.Since the computing tasks of deep network models usually have high computational complexity,the depth action recognition method based on a single view cannot achieve good recognition effect at this stage.Therefore,the data collected by multiple camera devices in the scene can be used to complete the multi-view action recognition task through comparative learning to improve the recognition performance.However,the calculation tasks of the deep network model usually have a large amount of data and high computational complexity,the deployment of deep action recognition network models on edge devices needs to take into account the problem of resource constraints.Using knowledge distillation can reduce the complexity of deploying models while ensuring recognition accuracy.Since the model requires a large number of manual labels when updating itself,and it cannot be provided in the real edge environment.Therefore,the supervised deep action recognition model cannot further improve the recognition performance.There is a need to study semi-and selfsupervised deep action recognition methods applied to edge action recognition tasks.In response to the above problems,first of all,this paper proposes a self-supervised multi-view human action representation learning method based on Contrastive Learning(Multi-view Action Recognition based on Contrastive Learning,MAR-NET)to improve the recognition performance of action recognition tasks in edge intelligence environments.MAR-NET adopts a multi-view data input method,can use the self-supervised method to learn online,and only processes video data on the spatial stream to extract action features.In addition,this paper designs a multi-view self-supervised action comparison recognition strategy for MAR-NET,and learns view-independent action features from multiple views under the condition of maximizing mutual information to improve the recognition effect.Secondly,this paper proposes a multi-view human action recognition method(CrossModel Distillation for Multi-View Action Recognition,CMMVL)based on cross-modal distillation to improve the recognition accuracy of action recognition tasks in actual edge scenes.This method is optimized on the basis of the self-supervised multi-view human action representation learning method,using the cross-modal distillation algorithm to introduce bone data,and using a complex deep network pre-trained based on bone data to guide multiple RGB-based student networks to learn action characteristics to improve the recognition accuracy.In addition,a knowledge distillation method for multi-student models is designed,which can efficiently learn the knowledge obtained from the teacher model and is easy to deploy in edge intelligent environments.Finally,in a real edge intelligence environment composed of edge nodes with different hardware configurations,a multi-view deep action recognition platform(Deep Action Recognition using Cross-view Video Prediction at Edge,DARCV)oriented to edge intelligence is designed and implemented.It is used to support MAR-NET and CMMVL to complete action recognition tasks and improve model performance.The experimental results based on the DARCV platform show that MAR-NET can efficiently learn multi-view viewindependent action features,and at the same time improve the performance of the selfsupervised action recognition model.After optimizing MAR-NET,CMMVL can significantly improve the model recognition accuracy.

Keywords/Search Tags:

Edge Intelligence, Muti-view Action Recognition, Contrastive Learning, Cross-modal Knowledge Distillation, Self-supervised Video Learning Experiment Platform

Related items

1	Research On Action Recognition Based On Skeleton Data
2	Research On Self-supervised Action Recognition Based On Contrast Learning
3	Cross-Modal Sketch Retrieval Based On Self-Supervised Learning And Knowledge Distillation
4	Cross-modal Representation Learning Based On Multi-negatives Supervised Contrastive Mechanism And Its Application
5	Research On RGB Video And 3D Skeletal Sequence Based Cross-view Human Action Recognition
6	Research On Geometric Solution Method Based On Cross-modal Learning
7	Research On Semi-supervised Human Action Recognition Based On Convolutional Neural Network
8	Research On Video Action Recognition Based On Transfer Learning
9	Research On Deep Cross-modal Retrieval Algorithm Based On Representation Learning
10	Research On View-invariant Human Action Understanding In Skeleton Sequences