| Human action recognition is playing an increasingly important role in machine vision fields such as intelligent security,virtual reality and safe driving.However,due to the complex background and different body shapes in the real-world images,the accurate recognition of human action has a high degree of difficulty.In this paper,human action recognition technology in two-dimensional static images is studied in combination with human skeleton keypoint detection and deep learning technology.The main work of this paper is as follows:1.Analyzed and studied the existing human action recognition technology,as well as the keypoint detection network based on the characteristics of the existing methods,this paper puts forward some ideas and a framework of the fusion keypoint detection and human action recognition,in which human action recognition is regarded as a two-dimensional image classification task.We also use the multitasking network form,and introduce the output of keypoint detection network into the trunk classification network to assist and strengthen the classification function and improve the accuracy of behavior recognition.2.ResNet-50 and CPN networks were used and improved as the basic networks of trunk behavior classification network and keypoint detection respectively.In order to further improve the performance of network on classification,TridentNet is added and improved in the backbone network.TridentNet makes use of the characteristics of dilated convolution and weight sharing to make the network controlling parameters on the basis of expanding the receptive field.Through the improvement of trident network,it is more suitable for the structure of classification network and better integrated into ResNet-50 network.3.A CPN cascading pyramid network with CBAM attention model was proposed to solve the problem of low accuracy caused by background noise,occlusion and light environment in multi-person keypoint detection task.CBAM can independently learn the importance of each location in the channel and space of the feature map to the result.By introducing attention parameters at the deepest part of CPN network,the detection accuracy of CPN for uncertain keypoints was further improved.4.Conducted experiments and tests on the model and algorithm in this paper.In this paper,the original ResNet-50,the ResNet-50 with TridentNet and the three-branch network were compared on four data sets.The experiments showed that the improved ResNet-50 had fewer parameters and faster convergence.For the keypoint detection network part,this paper designed a contrast experiment and an ablation experiment,compared the CPN network with it joined with CBAM attention model,and the CPN network with four other keypoint detection networks respectively on MSCOCO dataset,the results show that the improved CPN networks achieved better accuracy in multi-person task;For the overall framework of human action recognition,we also designed two experiments,compared the network that fuse keypoints with the network that do not fuse keypoints,as well as the improved network with five other action recognition on Pascal Voc dataset respectively,and on this basis,we also draw a confusion matrix.After many experiments we can find that the improved human action recognition network,which integrates the information of human keypoints,can more accurately identify the human action in the still pictures. |