At present,there are more than 20 million deaf-mute people in China,and sign language is the most important communication tool for deaf-mute people.However,non-deaf-mute people have few occasions to be exposed to sign language,it is relatively difficult to learn and master sign language,and it takes a long time to train professional sign language interpreters,so the number of people engaged in sign language interpretation is relatively small.The inability to communicate normally will affect the education,employment and correct self awareness of deaf people,making it difficult for them to better integrate into society.With the development of computer vision technology and deep learning technology,it has great significance to study automatic sign language recognition methods based on video to improve the quality of life of deaf-mute people.The paper studies the sign language recognition method based on deep learning technology and realizes the sign language recognition function;in order to realize the sign language recognition function in mobile devices and expand the scope of use,a lightweight sign language recognition network is further studied,which reduces the amount of parameters and calculation complexity,reduces the hardware requirements of deep neural networks.The main work content of this paper is as follows:(1)In depth study of the relevant theory of deep learning and the production of CSL-10 and CSL-50 Chinese sign language datasets.Firstly,the principle of convolutional neural network and theoretical knowledge of structure are studied in depth.Then based on the SLR500 dataset of the University of Science and Technology of China,the CSL-10 and CSL-50 datasets containing 61432 and 282189 pictures are produced respectively.Sign language video processing methods include video key frames extraction,cropping,scaling and normalization.The key frames extraction method uses the equal time interval sampling method.(2)A sign language recognition method based on an improved CNN-LSTM network is proposed.Based on the study of three dimensional convolution,two plus one dimensional convolution and CNN-LSTM network,an improved Sh-Res-LSTM network is proposed.First,in order to enhance the generalization ability of CNN-LSTM network,an improved Sh-Res residual module is proposed and the Sh-Res module is embedded in the Res Net18 network architecture;then label smoothing technology is introduced into the training process;finally,The improved network is trained and tested on the CSL-10 and CSL-50 datasets,achieving recognition rates of 97% and 99.8%,respectively.(3)A lightweight convolutional neural network improved by the Ghost module and the DFC attention mechanism is proposed.First use the conventional convolution to extract the original data features;then input the extracted features into the improved C_Ghost V2 module based on the Ghost module and the DFC attention mechanism,and aggregate the global feature information;then use the improved G module based on the Ghost module to extract features;finally use the average pool operation to complete data dimensionality reduction.After the network is combined with the LSTM network,the influence of the learning rate and Batchsize on the network performance is studied through experiments,and the optimal learning rate and Batchsize are determined;on the CSL-10 and CSL-50 datasets,recognition rates of 88% and 99.4% are achieved,respectively.(4)Implement a sign language recognition system based on video.Determine the software and hardware platform according to actual needs,design the overall structure of the system.Combining Sh-Res-LSTM and Gnet-LSTM networks to design graphics user interface,and realized the function of sign language recognition system.The paper deeply studies the CNN-LSTM network structure.The improved Sh-Res-LSTM enhances the ability of sign language recognition,and the designed lightweight Gnet-LSTM network greatly reduces the amount of network parameters and calculations.At the same time,a graphical user interface is designed,and a sign language recognition system based on video is realized. |