| With the rapid development of deep learning and computer vision research and application,human action recognition has become a popular branch in the field of computer vision.At present,human action recognition has a wide range of applications in many industries and life,such as intelligent monitoring,accident warning and human-computer interaction.Remarkable achievements have been made in human action recognition based on skeleton data,but the shortcomings of existing methods such as poor universality and high cost have become a great challenge to the research field of skeleton-based action recognition.This paper studies human action recognition based on skeleton data and convolutional neural network model.Aiming at small sample behavior recognition with high accuracy,three aspects of human action features description,skeleton sample data augmentation and attention mechanism were studied.A group of more accurate temporal and spatial characteristics describing human action,a data augmentation strategy based on skeleton action sequence and an adaptive multi-scale mixed attention mechanism network model were proposed.Finally,a skeleton-based human action recognition algorithm with high universality is designed.The main research contents of this paper are as follows:(1)Aiming at the problem of low accuracy of action recognition caused by insufficient description of human motion features,this paper proposes a set of relative action features,including relative coordinates,instantaneous velocity and instantaneous direction of motion,to describe human movements more carefully and accurately.At the same time,the spatial and temporal features are extracted from the two perspectives of the joint and the skeletal edge to make up for the potential problems in the single use of the joint features.The relative action features proposed in this paper not only contains spatial dimension features,but also provides short-term and longterm time information,which makes the identification accuracy more accurate.(2)In view of the high cost of sample collection and calibration and the problem that small sample data sets are easy to lead to model overfitting,this paper combined the skeleton structure and action execution characteristics of human body and proposed sample augmentation strategies based on skeleton action sequences,including: Rotate the skeleton direction to simulate the human body to perform actions in different directions,scale the skeleton size to simulate the human body to perform actions at different heights,shift the skeleton to simulate the human body to perform actions at different displacements and increase or decrease the action frames to simulate the human body to perform actions at different speeds.A large number of reliable human action samples are quickly generated in space and time respectively to solve the model overfitting caused by insufficient samples and improve the generalization ability of the network.After testing the data set,the proposed spatio-temporal data augmentation strategy based on skeleton is universal.(3)The action sample size is inconsistent due to the duration of human action,which is inconsistent with the requirement of fixed scale input of traditional convolutional neural network.At the same time,At the same time,the recognition of similar actions is difficult and precision is low because of their similar action features.This paper proposes an adaptive multi-scale hybrid attention mechanism network model,which consists of two plug and play modules: adaptive multi-scale pooling module and adaptive hybrid attention module.Among them,the adaptive multi-scale pooling module realizes the input of arbitrary scale and the output of a single scale,which meets the practical demand of multi-scale input.Moreover,the multi-scale sample input enables the model to learn multi-scale features and further improves the accuracy of the model.The adaptive hybrid attention module captures important features in channel and space dimensions,focuses on important information,and solves the problem of low recognition accuracy between similar actions.In addition,the convolution kernel is selected adaptively so that the module can also receive multiscale samples to meet the requirements of multi-scale input,which further improves the universality of the proposed method. |