Font Size: a A A

Gesture Recognition Based On Multi-modal Fusion Of RGB-D Images

Posted on:2020-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:W T ChengFull Text:PDF
GTID:2428330572475639Subject:Mechanical engineering
Abstract/Summary:PDF Full Text Request
As a natural and convenient means of interaction,non-contact gesture recognition based on computer vision plays an increasingly important role in the application of human-computer interaction.With the development of depth sensors,there are more ways to acquire RGB-D images and the acquisition cost is gradually reduced.At the same time,the introduction of depth information in RGB-D gesture images can overcome the influence of background and illumination changes,and improve the performance of recognition algorithm.However,how to make full use of rich texture information and scene spatial information in RGB-D images is a problem to be considered.The emergence of deep learning techniques such as convolution neural network provides an effective way for the depth utilization of RGB-D images.To solve the above problems,the Kinect sensor is used to obtain RGB-D image,and the multi-modal fusion gesture recognition method based on RGB-D image is studied.Specific research contents are as follows:(1)Aiming at the complementarity between different modes of RGB-D image and the complementarity between multi-level features,a multi-mode and multi-level feature extraction method based on dual-stream convolution neural network is proposed.Two residual networks are constructed to extract the features of different modes and output them at each convolution level.The features of different abstract levels and different modes are obtained for subsequent processing.(2)Based on the above features,an adaptive learning algorithm of feature weights is designed,considering the inconsistency of the influence factors of different modes on the final recognition results.By forcibly dividing features into two parts: independent and shared features,and then fusing them according to adaptive weights,a more compact and discriminative multi-modal fusion feature can be obtained.(3)Design a gesture classification and recognition model,and design its structure and parameters.After feature fusion,multi-modal fusion features of different abstract levels are obtained,which are sequentially sorted and then input into LSTM network.The output of the network is connected to the Softmax layer,and finally the classification prediction results of gestures are obtained.(4)A multi-modal gesture recognition system is established to realize the interaction between human hand and computer and to verify the effectiveness of the gesture recognition method in this paper.The algorithm is encapsulated by Tkinter GUI graphical development tool,and the system interface and function are designed.Finally,the feasibility and accuracy of the gesture recognition algorithm are verified.
Keywords/Search Tags:RGB-D image, Multi-modal and multi-level feature fusion, Gesture recognition, Feature extraction of dual-stream convolution neural network, Human-computer interaction system
PDF Full Text Request
Related items