Font Size: a A A

Research On Real-time Dynamic Gesture Recognition Based On RGB Video Stream

Posted on:2022-09-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y R TangFull Text:PDF
GTID:2518306338486764Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With various forms of computing devices infiltrating people's lives,gesture recognition,as the most primitive and natural human-computer interaction mode,is applied in many scenarios with professional sensing devices.However,with the development of video applications in recent years,people hope that dynamic gesture recognition can also be applied to ordinary devices.However,gesture recognition based on RGB modal video streams collected by PCs and mobile devices usually faces problems such as untrimmed video streams,cluttered backgrounds,and loss of hand information.And for real-time recognition of dynamic gestures,it not only faces the limitation of computational resources but also the challenge of single-time activation of gestures.Therefore,how to balance the accuracy and real-time performance of dynamic gesture recognition has become a major challenge in applying it to daily applications.Therefore,in order to improve the performance of three-dimensional convolutional network(3D CNN)for dynamic gesture recognition in RGB modal,this paper proposes an end-to-end multimodal training architecture based on optical-flow-like feature extraction network TVNet(TVMT).TVNet is formed by unfolding the iteration of the traditional TV-L1 algorithm into the TVNet layer.The optical-flow-like features of static RGB image frames are extracted as auxiliary modal information in multimodal training.The spatiotemporal semantic alignment loss function(SSA)is used to encourage different modal networks to learn the same understanding of the same input scene,so that the RGB modal network can use the spatiotemporal features of gestures learned by the optical-flow-like modal network.And the focal regularization parameter(FRP)is applied to prevent the negative transfer of knowledge.In order to improve the robustness and efficiency of real-time dynamic gesture recognition,this paper proposes a sliding window based online recognition system with dynamic decision threshold(DtSWORS).A sliding window is used to process the untrimmed video stream,and by applying the wake-up mechanism,the offline 3D CNN network can run online.The detection result is cached and filtered by the posterior processing module to perform detection error processing.For the single-time activation challenge of gesture recognition,this paper proposes to use reinforcement learning network to learn the decision thresholds of the single-time activation module,so as to dynamically adjust the thresholds according to the different characteristics of input video streams.
Keywords/Search Tags:human-computer interaction, gesture recognition, 3D CNN, reinforcement learning, single-time activation
PDF Full Text Request
Related items