Font Size: a A A

Research On Sign Language Recognition In Sign Language To Speech Conversion

Posted on:2020-08-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y L AnFull Text:PDF
GTID:2428330572485940Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,Due to various reasons,there are more and more deaf people in China,and speech impaired people have become the focus of government and people..Therefore,with the rapid development of human-computer interaction technology,in order to enable hearing and speech impaired people to communicate with normal people in daily life,to meet the needs of deaf and mute people in work,life,medical treatment and other aspects,and to expand the social resources they can use,it is now a very urgent need.At present,most scholars' study only use deep learning to study sign language to speech conversion,but there is a lack of in-depth study on the influence of parameter changes in the structure of Deep Belief Network(DBN)and Convolutional Neural Networks(CNN)on the recognition rate of sign language,so as to improve the recognition rate of sign language,improve the accuracy of sign language to speech conversion,and better realize the conversion of sign language to speech,so that the voices heard by deaf people are more accurate,smooth and natural..To this end,this paper conducts DBN and CNN training on the gestures input by the user,and combines NN to classify and recognize 40 deaf-mute gestures,so as to obtain the semantic information of the gestures,and then get the text corresponding to the sign language.A sign language context-dependent label for Mandarin/Tibetan speech synthesis is generated from the recognized text of sign language by text analysis program.Then,combined with speaker adaptive training(SAT),the deep neural network model was used to realize speech synthesis.The main works and originalities are as follows:Firstly,The gesture database of deaf-mute people was made.According to the national standard sign language,in the process of building gesture library,based on Microsoft Kinect camera technology,manual camera method is adopted.In the collection process,a single pure white background is used to avoid interference of complex backgrounds on gesture recognition.In order to build a relatively robust gesture library,10 digital gestures and 30 alphabetic gestures were created,each of which was shot by a different crowd.Then these images are processed based on MATLAB,which are image graying,threshold segmentation,pixel feature extraction,and finally generate label files,which are combined into corresponding data files according to the label files.Secondly,The optimization of DBN structure parameters and the conversion of sign language to Chinese and Tibetan languages are realized.In the training process of the DBN,some structural parameters of the DBN,such as the number of hidden layer nodes,the number of hidden layers,and the learning rate,have a great influence on the gesture recognition rate.Therefore,in the process of using DBN for gesture recognition,the study of its structure contributes to better speech synthesis.The optimal parameters of DBN are obtained from experiments: the number of hidden layers is 3,the number of nodes is 250,150 and 150,and the learning rate is 0.8.After the above experimental process,the structure of CNN is studied.Experiments were made on the factors that can affect the recognition rate of the gesture,such as the learning rate,the number of convolution kernels,or the number of nodes in the convolution kernel,in the CNN,which has a good understanding of the CNN model and is conducive to synthesizing better speech.In the process of Mandarin-Tibetan bilingual speech synthesis,the DNN framework based on deep learning is adopted,and the powerful engine of MATLAB is used for speech synthesis.Finally,the subjective and objective methods are used to evaluate the synthesized speech.The experimental results show that the static gesture recognition rate of the optimal DBN is 98%,the subjective evaluation of the generated Chinese is more than 4,and the evaluation of Tibetan is more than 3.Good results are obtained in the objective evaluation,which shows that the sign language to speech conversion system is feasible.
Keywords/Search Tags:gesture recognition, deep belief network convolutional neural networks, sign language context-dependent label, mandarin and tibetan speech synthesis
PDF Full Text Request
Related items