Font Size: a A A

Natural Hand Gesture Segmentation And Semantic Recognition Based On Mask R-CNN

Posted on:2021-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:R ZhangFull Text:PDF
GTID:2428330629482567Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hand gesture is a simple and intuitive form of interaction between people.With the rapid development of artificial intelligence and computer vision,the recognition of gestures has changed from the use of various external auxiliary equipment to the research stage based on computer vision.As a new human-computer interaction technology,augmented reality combines the real world scene with virtual information such as text,image,audio and video generated by computer.Augmented reality technology uses computer-generated virtual information to complement the real world,so that virtual information and real environments can be displayed in real time on the same screen or space,so that you can observe and analyze data information and physical objects in real scenes.In recent years,it has become one of the research hotspots of many scholars at home and abroad.By using natural gestures for augmented reality systems,and through the interaction between natural gestures and virtual objects,we strive to solve some of the main problems of virtual reality interactions in scenes so that they can create a more immersive interactive experience.In order to achieve fine segmentation and accurate semantic recognition of natural hand gestures,comprehensive consideration of the shortcomings and shortcomings of existing recognition algorithms such as low recognition rate,poor robustness,and poor segmentation accuracy,this paper proposes a hand gesture segmentation and recognition method based on Mask R-CNN.This method uses the feature pyramid network based on multi-scale feature fusion,optimizes the candidate window classifier,and introduces a pixel-level segmentation mask based on a scoring strategy to achieve accurate segmentation and recognition of natural gestures.First,multi-scale feature fusion is performed on the feature pyramid network in the Mask R-CNN backbone network,so that it has bottom-up reverse connection and horizontal connection and multi-scale feature map fusion.Second,the window classifier is in The feature extraction networkand RoIAlign structure are added to the Dropout layer for optimization to prevent overfitting during the training process.Finally,the mask scoring strategy MaskIoU Head is introduced to improve the mask branch accordingly to achieve accurate segmentation of the mask.Through the analysis and research of the overall process of augmented reality technology,with the image segmentation and gesture recognition as the core,the Mask R-CNN algorithm is improved.The feature pyramid based on multi-scale fusion is used as an extension of the feature extraction network to classify candidate windows.Improver to prevent training over-fitting and optimize the pixel-level segmentation mask structure.Experimental results show that the improved algorithm based on Mask R-CNN can effectively avoid overfitting problems during training.Compared with traditional algorithms,the algorithm has higher gesture recognition rate,better segmentation accuracy and robustness.By acquiring the depth information of the gesture and the segmentation result,the position of the gesture is detected,thereby realizing simple virtual-real interaction.
Keywords/Search Tags:Augmented Reality, Hand Gesture Segmentation, Instance Segmentation, Mask R-CNN, Virtual-Reality Interaction
PDF Full Text Request
Related items