| Under the background of the mobile internet era,intelligent home,intelligent wear,intelligent travel and other application scenarios make people interact more and more closely with smart terminal devices,so it has important application value to realize accurate and efficient gesture interaction.Hand segmentation and gesture detection are important research directions for gesture interaction,which jointly determine the user experience in gesture interaction scenarios.However,due to the flexibility and diversity of human hands,it is often difficult to achieve ideal accuracy and efficiency.Traditional methods are limited to manually designed features,resulting in complex algorithm flow and poor robustness.With the rapid development of deep learning,feature learning methods based on convolutional neural networks have become a research hotspot.In this thesis,we try to apply the semantic segmentation network and object detection network to hand segmentation and gesture detection respectively,and propose the corresponding optimization schemes.In the hand segmentation task,aiming at the problem of inaccurate segmentation of hand contour pixels,the SA-FPN feature fusion module is designed to fuse the detail information of shallow features and the semantic information of deep features.And under the guidance of the attention mechanism,the salient information of the features is learned.Aiming at the problem that the number of bright and dark hand images in the OUHANDS dataset is not uniform,which leads to the unsatisfactory segmentation effect of the network on low-brightness images,a random gamma correction mechanism is introduced,so that the network can learn hand features of different brightness in the training stage.The experimental results show that the hand segmentation network can accurately segment the hand contour and the irregular areas of fingers,and the segmentation effect of the hand in dark light is improved.The mIoU of the network reaches 90.82%,which is 1.52%higher than the baseline network PSPNet.In the gesture detection task,in order to optimize the location accuracy of the gesture,an edge feature extraction branch structure is proposed,and a parallel dual-branch backbone feature extraction network is formed to guide the network to learn the edge features of the gesture.In order to improve the classification confidence of gesture categories,the RF-SPP receptive field enhancement module is designed and implemented to optimize the preliminary effective features and enhance the network’s ability to integrate context information.The experimental results show that the gesture detection network can obtain more accurate gesture positions and higher gesture classification confidence.Compared with the baseline network YOLOv4-Tiny,the mAP reaches 87.10%,which is an increase of 2.48%.In addition,in order to reduce the interference of the complex background,this thesis attempts to use the hand segmentation network as the pre-processing stage of gesture detection,and makes the mAP reach 88.44%,which is an overall increase of 3.82%. |