Research On 3D Convolution Human Motion Recognition Based On CNN-Transformer

Posted on:2024-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Q Li

Full Text:PDF

GTID:2568307097956479

Subject:Mechanics (Professional Degree)

Abstract/Summary:

PDF Full Text Request

The mechanical manufacturing industry is the cornerstone of China’s industrial sector.With changing market demands,the mechanical manufacturing industry urgently needs to shift from large-scale standardized production to customized production of multiple varieties and small batches.Human-computer interaction and artificial intelligence technology based on deep learning provide possibilities and means for this transition.In human-computer interactive collaborative manufacturing,intelligent robots detect and understand human behavior and intentions through behavior recognition,realizing highly customized and automated production.This article establishes a human behavior recognition algorithm based on video streams to recognize human movements and finally verifies the performance of the algorithm through testing on public datasets and actual scenarios.The behavior recognition algorithm in this article consists of three parts:human target detection model,human pose estimation model and human action recognition model.Based on the YOLOv4 network,the ABYOLOv4(ASPP+Bi-FPN+YOLOv4)human target detection model was constructed to detect the location of target humans in the image.First,in order to adapt YOLOv4 to the human target detection task,the multi-class detection model was simplified to a single-class detection model that only detects humans.Then,aiming at the problems of low detection accuracy and serious missing detection of medium and small scale human targets in complex visual scenes,the ASPP module was introduced on the basis of YOLOv4,and the middle layer convolution input was increased to establish a double Bi-FPN.The model performance was verified on public datasets.The results show that it has higher accuracy and lower model size,achieving a balance between accuracy and model size.Based on the TransPose network,the VTTransPose(V block+twin attention+TransPose)human pose estimation model was constructed to estimate the coordinates of human keypoints at the target location in the image.First,a sparse representation of a self-attention mechanism was introduced into TransPose to reduce computational cost and improve network efficiency.Then,a layer-inner feature fusion module V block was constructed to enhance the network’s ability to locate keypoints.The model performance was verified on public datasets.The results show that the VTTransPose model has higher detection accuracy,lower model size and computational cost,and can accurately locate human keypoints.Based on the PoseC3D network,the TPoseC3D(TPN+PoseC3D)human action recognition model was constructed to generate stacked three-dimensional keypoint heat maps and realize human action recognition.Aiming at the problem that actions with similar time rates are difficult to distinguish,a Temporal Pyramid Network(TPN)was constructed and introduced between the backbone network and the prediction head of the original PoseC3D to fuse features of actions with different visual rhythms and enhance the action discrimination ability of the network.The experimental results show that TPoseC3D can perform well in human action recognition tasks.By combining the above three parts of the model,actual scene experiments were carried out.10 action categories were collected in different scenes using cameras for experiments.The results show that the ABYOLOv4 human target detection model has good overall detection effect and is not easily affected by changes in human scale,but missing detection will occur when humans overlap in a large range.The VTTransPose human pose estimation model has a good detection effect in dealing with changes in human scale,angle and slight occlusion,with strong robustness,but when there is a large range of occlusion,the prediction of keypoints is inaccurate and the position of keypoints fluctuates.The TPoseC3D human action recognition network has high recognition accuracy for actions with large limb changes,and can correctly recognize action categories even when some historical information is lost,with strong robustness.

Keywords/Search Tags:

Human-computer interaction, Deep learning, Human Detection, Human Pose Estimation, Human Action Recognition

PDF Full Text Request

Related items

1	Human Action Recognition Based On Deep Learning
2	Human Action Recognition And Detection Based On Images Mapped From Skeleton Sequence
3	Human Pose Estimation And Its Application Based On Monocular Camera
4	Research On Human Pose Estimation Algorithm Based On Deep Learning
5	Human Pose Estimation And Action Recognition Using Deep Neural Networks
6	Methods For Detecting Human Pose And Recognizing Human Action In Video
7	Deep Learning Based Human Body Keypoints Detection And Application
8	Research On Pose Adaptive Human Action Recognition
9	Research On 3D Human Pose Estimation Based On Monocular Video
10	Study On Human Pose Estimaton,Tracking And Human Action Recognition In Videos