Font Size: a A A

Research On Human Pose Estimation Based On Deep Learning

Posted on:2024-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y J CaiFull Text:PDF
GTID:2568307100989479Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
The human pose estimation task is a challenging problem in the field of computer vision and deep learning,also known as keypoint detection.Its main goal is to use the human pose estimation network to infer the keypoints of all human instances in a given image or video and generate human skeletons by combining a priori knowledge(e.g.,connection relationships between keypoints,left-right symmetry of the human body,proportional relationships between human parts,etc.).This task has important research significance and application value in the fields of behavior recognition,motion capture,pose tracking,and human-computer interaction,etc.There are still several problems to be solved.The first is that the convolutional layer of the current human pose estimation network model can only extract local features,which in turn makes the network model inefficient in extracting spatial feature information and multi-channel feature information,and ultimately affects the accuracy of the human pose estimation network model.The second problem is related to the large number of network parameters and high complexity of operations,which leads to long operation time.To address the above-mentioned problems in the field of human pose estimation,this paper improves the high-resolution network(HRNet)based on attention mechanism and lightweight technology,and uses lightweight technology and attention mechanism to optimize and improve the network structure for the purpose of reducing the operation complexity and the number of parameters while maintaining the accuracy of the network model,and constructing a high-resolution network-based lightweight model,the main work of this paper is as follows:(1)An optimization model ENHRNet is proposed,which is based on the high-resolution network by introducing the channel attention module and the spatial principal module to modify the residual module in the high-resolution network,and to reduce the complexity of the spatial attention module,the cross-attention mechanism is subsequently invoked to improve the efficiency of the optimization model.The model is designed to solve the problem of lack of global contextual relationships in the output feature map of the HRNet network,and the output features between the channel domain and the spatial domain are readjusted so that the model can better capture diverse contextual information.To verify the improvement of the proposed method,the ENHRNet model was trained on both the COCO dataset and the MPII dataset,and the accuracy of the model for human keypoint detection was improved.(2)The operational complexity and number of parameters of the HRNet network are reduced by introducing the Ghost lightweighting module,which generates the original feature maps using fewer convolutional operations than the original network,and generates a network of redundant feature maps using simple linear transformations to substantially reduce the computational effort and model size while maintaining the same or better model performance.Then the Coordattention attention module is introduced to improve the performance of the lightweighted model for key point detection in the input image.Finally,the model is tested on the COCO dataset and MPII dataset,and the results show that the optimized network can maintain high keypoint detection accuracy while reducing the computational complexity and number of parameters.
Keywords/Search Tags:High-resolution networks, attention mechanisms, model lightweighting, human pose estimation
PDF Full Text Request
Related items