Sign language is the primary tool for people with hearing or speech impairments to communicate with the world,so there has been a wide interest in developing effective sign language recognition methods.Compared with wearable sensor-based sign language recognition methods,the flourishing deep learning techniques are increasingly attractive for vision-based sign language recognition.Therefore,in this thesis,we develop static and dynamic sign language recognition methods based on multiple network frameworks of deep learning,respectively.In static sign language recognition,the following work is included: Firstly,considering that the hand is a relatively small part of the whole image,in order to better focus on the hand,the improved attention module is obtained by fusing the channel attention module with the CA attention module to achieve simultaneous attention to the hand features and their locations.Secondly,the improved attention module is embedded into the Mobile Net V2 network with less number of parameters and lower computational cost,the static sign language recognition network model is constructed and the relevant parameters are set.Finally,the static sign language recognition model is compared with VGG16,Res Net50 and Mobile Net V2 on ASL and Handpose x sign language datasets,and the accuracy of this thesis’ s model is 99.97% on the ASL dataset,compared with the accuracy on VGG16,Res Net50 and Mobile Net V2 networks.The accuracy of this model on the ASL dataset is 99.97%,which is 0.1%,0.06%and 0.04% higher than that on VGG16,Res Net50 and Mobile Net V2 networks,respectively.The model performance is also re-validated on the Handpose x dataset,which fully illustrates the effectiveness and generalizability of the method in this thesis.In dynamic sign language isolated word recognition,the main work includes the following: Firstly,feature extraction of sign language video is performed using Res Net50 network which has certain superiority to image feature extraction,and considering that dynamic sign language has the feature of temporal sequence,the temporal dimension information of sign language features is processed using Transformer and LSTM.Therefore,Res Net50 is fused with Transformer and LSTM respectively to achieve effective recognition of isolated words in dynamic sign language;after experimental comparison,the performance of Res Net50-LSTM network is more superior for sign language recognition.Secondly,to address the problems of long-term temporal irregularity and loss of important features during the training of the Res Net50-LSTM network,the CBAM attention module was selected,the Res Net50-LSTM network based on the CBAM attention module was constructed and the relevant parameters were set.Finally,the dynamic sign language recognition model of this thesis was compared and analyzed with Res Net50-Transfer and Res Net50-LSTM on CSL-100 and DEVISIGN_D sign language isolated word datasets,and the accuracy of this thesis’ s model was 91.33% on CSL-100 dataset,compared with that on Res Net50-Transformer and Res Net50-LSTM by 2.09% and 5.34%,respectively.The model performance is also revalidated on the DEVISIGN_D dataset,which fully illustrates the effectiveness and generalizability of the method in this thesis. |