Vision-based human action recognition endows computers with human intelligence,which enables the computers to further understand human activities in natural scenes and make intelligent decisions on the basis of observing the outside world only.This paper is going to start from the mainstream deep learning methods.Firstly,the humanoid object detection algorithm is used to detect the human target area in the image or video,then the posture estimation algorithm is used to extract the human skeleton feature data from the target area,and then the human action recognition algorithm is used to realize the recognition and classification of the human action posture from the skeleton data.Finally,in order to test the possibility of deploying the model on the mobile terminal,the low-bit model quantization technology is used to compress the model parameter scale and reduce the memory space consumption of the model,thus speeding up the reasoning speed of the mobile terminal.The specific contents of the study include:(1)Firstly,the YOLOv5 s lightweight object detection model is used to train the humanoid detection dataset extracted from the COCO dataset to generate a humanoid object detection model for humanoid object detection.The model can achieve an accuracy rate of 79%,and has a good detection result for occlusion;(2)Secondly,the pose estimation algorithm of Alpha Pose based on the YOLOv5 s humanoid object detection model in this paper is used to train the human pose estimation model on the COCO human key points dataset,which is used to extract the key points of human bones.Comparing the attitude estimation model in this paper with the official attitude estimation model trained by the detector of YOLOv3-SPP,it is found that the model size and detection speed of this model are obviously improved when the recognition effect is not much different;(3)Thirdly,the Spatio-Temporal Graph Convolutional Networks is used to train the human action recognition model for the recognition and classification of human action by using the human skeleton dataset extracted from the Le2 i fall detection dataset using the pose estimation model in this paper.The experimental results show that the model can accurately identify the specific action category in the simple scene photographed by the monitor..(4)Finally,the 8-bit and 4-bit model quantization experiments of humanoid object detection model and human pose estimation model are carried out under the Magik platform,which is the artificial intelligence development platform.The experimental results show that the quantization aware training can greatly reduce the parameters of the model with little loss of accuracy,and the test reasoning speed of the model on T40 development board has been greatly improved.Compared with the original float model of 32-bit,the quantized model of 8-bit reduces the model size by 4 times and improves the reasoning speed by 4 times.Compared with the original model of 32-bit,the quantized model of 4-bit reduces the model size by 8 times and improves the reasoning speed by 8 times. |