Font Size: a A A

Research On Key Point Estimation Method Of 2D Multi-hand Attitude Based On Cascaded Parallel Convolutional Neural Network

Posted on:2022-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:2518306527955079Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the development of science and technology,the application of human-computer interaction such as smart home,driving gestures and smart medical treatment will add gesture interaction technology to it,and the estimation of key points of hand posture is very critical to the development of gesture interaction technology.With the continuous development of deep learning,the estimation of key points of hand posture develops rapidly.However,there are two problems in the estimation of key points of hand posture at present.First,the current research mainly only detects the single-handed key points.In more realistic and complex scenes,if the complete image is to be processed,it includes not only multiple hands,but also other objects such as human body,background,etc.;The second is the shortage of data sets.Network's training requires high quality data sets.However,the limited number and low quality of multi-hand RGB data sets currently available make it still challenging to achieve this goal.In view of the above problems,this paper carries out the following innovative research and exploration:(1)In view of the problem that the current research only detects the single-handed node.In this paper,a two-dimensional multi-hand pose key point estimation model(Hrnet-Hand)based on cascaded parallel convolutional neural network is proposed.This network model adopts the two-stage network idea of finding the hand first and then positioning the node.The first stage network uses the target detection YOLO network to accurately locate all hands in the image,and extracts the center point of the output target box as part of the second stage network input.In the second stage,HRNET network was transferred and learned to the key point estimation task of hand posture.The network maintains high resolution while simultaneously fusing multi-scale features in parallel,and adds a two-dimensional attention mechanism after preliminary convolution of the network to extract features.The attention mechanism weakens the weight of feature pixel values with low influence on the detection target region,while the weight of feature pixel values with high influence increases.Thus,the network expressiveness is enhanced,and the heat map of the multi-hand key points is more accurate in space.(2)For the problem of data set shortage.In this paper,the existing public multi-hand RGB data sets are labeled to ensure that all the hands in the pictures have corresponding key points,so as to improve the quality of the data set.In addition,we build our own dataset,DCD8-6000,with real hand images,and annotate them manually with high quality.In order to verify the effectiveness of the method proposed in this paper,multiple groups of comparative tests were conducted with three classical hand posture key estimation algorithms on three data sets with rich background,occlusion and complex gestures:MPII-Hand,NZSL and DCD8-6000.The results show that the proposed model is effective,which realizes the multi-hand attitude key point estimation task based on single RGB image only and improves the detection accuracy.
Keywords/Search Tags:RGB image, Multi-hand Datasets, Key Point Estimation, Attention Mechanism
PDF Full Text Request
Related items