Font Size: a A A

Research On Multimodal Gesture Location And Recognition Based On Attention Mechanism

Posted on:2022-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2518306737956949Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent technology,human-computer interaction is a crucial research field in modern society.The research of gesture interaction has a hot issue in the field of human-computer interaction.Multimodal interaction technology is one of the key research contents in the field of human-computer interaction.Firstly,the model of multimodal fusion for single-task learning is introduced for improving the accuracy and robustness of gesture recognition,since the existing gesture models have low precision and strong instability.Aiming at these problems,this paper proposes a multimodal information fusion method based on a hybrid attention mechanism,which makes full use of the complementarity between multimodal information to complete the task of recognizing gestures with high accuracy.After that,the model of multimodal fusion for multi-task learning is presented to solve the problems that the existing multi-task models have the poor utilization of the correlation between different tasks and multimodal information.This paper proposes a multimodal gesture location and segmentation model based on the attention mechanism to adaptively fuse task-related multimodal information,which could complete multiple tasks learning efficiently.The research of this paper helps computer systems to understand human intentions better in the process of human-computer interaction and has important application value.More details are as follows:1.Aiming at the related feature extraction problem of a single modality,the convolutional neural networks and fully connected neural networks is introduced to extract features for different modalities: video,audio,and skeleton modalities.Extracting feature representations of different modalities as input for subsequent single-task multimodal fusion.2.Targeting the problem of multimodal fusion for gesture recognition,a multimodal fusion method based on a hybrid attention mechanism for gesture recognition is proposed to make full use of the correlation between multimodal information and complete mutual complementarity between multiple information.Firstly,a cross-attention mechanism is presented for multi-dimensional feature information fusion to achieve mutual enhancement of multi-dimensional features.Secondly,a single-attention mechanism is put forward for one-dimensional representations and multi-dimensional representations to get the trade-off for the correlation and redundancy between different dimensional modalities.Experimental results demonstrate that compared with baselines,our method based on the hybrid attention mechanism gets the best gesture recognition accuracy,which is 96.05%.3.To tackle the problem of multimodal fusion in the case of multi-task learning,a multimodal fusion model based on the attention mechanism for gesture location and segmentation is presented to achieve multi-task collaborative training and obtain the corresponding multiple combinations of modal feature information.Firstly,a feature cross mechanism in the adaptive cross mechanism is used to obtain different task feature combinations based on shared features,which can find task-related multimodal feature groups.Secondly,the channel attention mechanism in the adaptive cross mechanism is introduced to learn and strengthen multimodal features,which can achieve perceptual enhancement of modality.Finally,a soft attention mechanism is used to dynamically adjust the importance of different tasks during the training step to help the model balance and optimize multiple objective functions.Experiments show that the MSE of the CCSM model is proposed by this paper for the location task is 0.00142,the accuracy of the segmentation task is 0.95255,and the Io U is 0.79623.The performance on the two tasks is better than the compared multi-task methods.
Keywords/Search Tags:Gesture recognition, Gesture location, Multimodal fusion, Multi-task learning, Deep learning, Attention mechanism
PDF Full Text Request
Related items