Font Size: a A A

The Robustness Verification And Performance Improvement Of Human-Skeleton-Based Action Recognition

Posted on:2021-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhouFull Text:PDF
GTID:2428330647950762Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
The rapid development of deep neural networks has effectively promoted research in computer vision.More and more computer vision works introduce deep neural net-works and have made great development and progress.For example,autonomous driv-ing,identification of face identity,virtual reality,etc.,facilitate and enrich modern lifestyle.What's more,recognizing human actions in the video can help understand the video's content and can be widely used in video surveillance,human-computer inter-action,etc.The main works of this paper are the robustness verification and performance im-provement of human-skeleton-based action recognition.Devices such as cameras and mobile phones have limited storage space and computing resources,and the storage and calculation cost of skeleton information in the action recognition input data modal is the smallest,which is more suitable for input of action recognition of the resource-limited device.At present,human-skeleton-based action recognition has made some progress,but its robustness has not been verified.This paper proposes a global pertur-bation generation method based on the orthogonality of the classification probability space.On this basis,an substitute model attack method based on distillation is devel-oped to verify the lack of robustness of the recognition network based on spatiotempo-ral graphs convolution.In addition,the small network improves the robustness,but the accuracy is insufficient.Considering the need to improve the accuracy and robustness of human-skeleton-based action recognition,this paper proposes an action recognition architecture based on multi-modal feature fusion,which improves accuracy and robust-ness of human-skeleton-based action recognition in scenarios where storage space andcomputing resources are limited.The main contributions of this thesis are listed as follows:1.A white box attack scheme based on global perturbation of classification proba-bility space orthogonality is proposed,which can effectively implement the robust-ness verification of the network model based on human skeleton action recognition.Based on the white box attack method,a black box attack based on the distillation substitute model attack is proposed.For the global perturbation based on orthogo-nality of the classification probability space,this paper adds a slight perturbation to the sample and changes its classification probability space so that the new classifi-cation probability space is orthogonal to the original classification space.We suc-cessfully generate a small perturbation for global data.For distillation-based sub-stitute model attacks,we propose a substitute model attack based on the knowledge distillation.The distillation process is used to make the original attacked network generate 'soft targets',and only a small amount of data is used to train a substitute model with good generalization performance.We generate a global perturbation for the substitute model by using the white-box attack method proposed in this paper.The global perturbation is also effective for the original model.2.A human skeleton action recognition based on multimodal feature fusion is pro-posed to improve both the accuracy and robustness.When using one single modal of skeleton for human action recognition,it is not robust enough and easy to pro-duce misclassification.Some actions are relatively static and their skeleton shape are very similar with and other categories.We use a small amount of picture in-formation as a supplement,that is,extract a frame of pictures from each video.According to the feature replacement training process proposed in this paper and the constraints of KL divergence,we can learn two types of features for each fea-ture modality,one is a shared part which can be generalized to other modalities,and the other part is helpful for classification but more relevant to the specific dataset and the modality.Among them,the KL divergence loss function can also make the shared and generalized parts of the skeleton and picture feature modalities comple-ment each other,improving the classification boundary of single modality.Experi-mental results show that the proposed method improves the accuracy and robustness of human-skeleton-based action recognition in scenarios where storage space and computing resources are limited.
Keywords/Search Tags:Skeleton-Based Action Recognition, White-Box Attack, Black-Box Attack, Robustness, Multimodal Fusion
PDF Full Text Request
Related items