Font Size: a A A

Deep Action Recognition Using Cross-View Video Prediction At Edge

Posted on:2024-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:R X ZhangFull Text:PDF
GTID:2568307172969839Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The combination of artificial intelligence and edge computing provides a new research direction for deep action recognition.In the novel new edge intelligence environment,the deep action recognition model is directly deployed on the terminal device.Since the computing tasks of deep network models usually have high computational complexity,the depth action recognition method based on a single view cannot achieve good recognition effect at this stage.Therefore,the data collected by multiple camera devices in the scene can be used to complete the multi-view action recognition task through comparative learning to improve the recognition performance.However,the calculation tasks of the deep network model usually have a large amount of data and high computational complexity,the deployment of deep action recognition network models on edge devices needs to take into account the problem of resource constraints.Using knowledge distillation can reduce the complexity of deploying models while ensuring recognition accuracy.Since the model requires a large number of manual labels when updating itself,and it cannot be provided in the real edge environment.Therefore,the supervised deep action recognition model cannot further improve the recognition performance.There is a need to study semi-and selfsupervised deep action recognition methods applied to edge action recognition tasks.In response to the above problems,first of all,this paper proposes a self-supervised multi-view human action representation learning method based on Contrastive Learning(Multi-view Action Recognition based on Contrastive Learning,MAR-NET)to improve the recognition performance of action recognition tasks in edge intelligence environments.MAR-NET adopts a multi-view data input method,can use the self-supervised method to learn online,and only processes video data on the spatial stream to extract action features.In addition,this paper designs a multi-view self-supervised action comparison recognition strategy for MAR-NET,and learns view-independent action features from multiple views under the condition of maximizing mutual information to improve the recognition effect.Secondly,this paper proposes a multi-view human action recognition method(CrossModel Distillation for Multi-View Action Recognition,CMMVL)based on cross-modal distillation to improve the recognition accuracy of action recognition tasks in actual edge scenes.This method is optimized on the basis of the self-supervised multi-view human action representation learning method,using the cross-modal distillation algorithm to introduce bone data,and using a complex deep network pre-trained based on bone data to guide multiple RGB-based student networks to learn action characteristics to improve the recognition accuracy.In addition,a knowledge distillation method for multi-student models is designed,which can efficiently learn the knowledge obtained from the teacher model and is easy to deploy in edge intelligent environments.Finally,in a real edge intelligence environment composed of edge nodes with different hardware configurations,a multi-view deep action recognition platform(Deep Action Recognition using Cross-view Video Prediction at Edge,DARCV)oriented to edge intelligence is designed and implemented.It is used to support MAR-NET and CMMVL to complete action recognition tasks and improve model performance.The experimental results based on the DARCV platform show that MAR-NET can efficiently learn multi-view viewindependent action features,and at the same time improve the performance of the selfsupervised action recognition model.After optimizing MAR-NET,CMMVL can significantly improve the model recognition accuracy.
Keywords/Search Tags:Edge Intelligence, Muti-view Action Recognition, Contrastive Learning, Cross-modal Knowledge Distillation, Self-supervised Video Learning Experiment Platform
PDF Full Text Request
Related items