Font Size: a A A

Human Action Recognition Based On Convolutional Neural Networks

Posted on:2020-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2518306464991479Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Human action features are crucial in applications of human-computer interaction,intelligent control,video retrieval and video surveillance areas,because of the large amount of information that represents action category it contains.Based on the algorithm framework of convolutional neural network,this paper studies the human action recognition problem in natural and specific scenes.The research content mainly includes the following two parts:(1)Research on human action recognition based on spatially convolutional neural networkThe traditional human action recognition network has a poor ability to extract the feature information of video data,which leads to the problem that the action category cannot be effectively characterized.Aiming at the above problem,this paper designs a lightweight and densely connected human action recognition algorithm based on spatially convolutional neural network.By constructing the structure of densely connected feature extraction and multi-scale feature combination prediction,the spatial features of the action in an image are fully extracted and fused,which then can obtain more effective class representation information to improve recognition accuracy;In addition,the depthwise separable convolution and the CRe LU activation function are used in the network to replace the traditional convolution and activation methods,which are used to reduce the model parameters and improve the recognition speed of the network;In order to increase the diversity of training samples,the bootstrap method is used to optimize the training of the network model;Finally,the multi-task supervisory loss function is used for the regression of the human position boundingbox and the classification of the action category to output the result of the human action recognition.From the analysis of the experimental results,it can be concluded that the deep network structure designed in this paper has good recognition performance for human actions of different target numbers,different viewing angles,multi-poses,multi-scale,and so on in natural scenes and specific scenes.The recognition rate of each image reached 36 ms,meanwile the recall rate for the test set of teacher actions in the classroom scenario reached 93.56%.(2)Research on human action recognition based on three-dimensionally convolutional neural networkSpatial convolution and existing three-dimensional convolution recognition algorithms are not still very strong in distinguishing the human actions of small-scale,complex poses and small inter-class gaps in natural scenes and specific scenes,which may lead to a situation of misunderstanding.Aiming at the above cases,this paper proposes a human action recognition algorithm with fusion of spatial-temporally dense features based on three-dimensionally convolutional neural network.The three-dimensional and densely connected convolutional network is used as the feature extraction infrastructure,then the non-locally dense computing module of features is introduced in dense blocks,and the spatial-temporally pyramid pooling module is introduced in front of the fully connected layer at the end,which is designed to be a convolutional neural network with fusion of spatial-temporally dense features for the 3D video input data different in duration and spatial size to enhance the model's ability to represent the spatial-temporal feature information of human actions;At the same time,the large-margin softmax loss function with target-adjustable mechanism is used as the supervised function to further improve the action recognition accuracy of the network.After multiple groups of experiments comparing the results of human action recognition,the recognition accuracy of this algorithm on the teacher action dataset in the classroom scene can reach 98.52%.
Keywords/Search Tags:human action recognition, convolutional neural network, densely connected, lightweight, fusion multi-scale features, three-dimensional convolution, fusion of spatio-temporally dense features
PDF Full Text Request
Related items