Research On Sign Language Recognition In Sign Language To Speech Conversion

Posted on:2020-08-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y L An

Full Text:PDF

GTID:2428330572485940

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

In recent years,Due to various reasons,there are more and more deaf people in China,and speech impaired people have become the focus of government and people..Therefore,with the rapid development of human-computer interaction technology,in order to enable hearing and speech impaired people to communicate with normal people in daily life,to meet the needs of deaf and mute people in work,life,medical treatment and other aspects,and to expand the social resources they can use,it is now a very urgent need.At present,most scholars' study only use deep learning to study sign language to speech conversion,but there is a lack of in-depth study on the influence of parameter changes in the structure of Deep Belief Network(DBN)and Convolutional Neural Networks(CNN)on the recognition rate of sign language,so as to improve the recognition rate of sign language,improve the accuracy of sign language to speech conversion,and better realize the conversion of sign language to speech,so that the voices heard by deaf people are more accurate,smooth and natural..To this end,this paper conducts DBN and CNN training on the gestures input by the user,and combines NN to classify and recognize 40 deaf-mute gestures,so as to obtain the semantic information of the gestures,and then get the text corresponding to the sign language.A sign language context-dependent label for Mandarin/Tibetan speech synthesis is generated from the recognized text of sign language by text analysis program.Then,combined with speaker adaptive training(SAT),the deep neural network model was used to realize speech synthesis.The main works and originalities are as follows:Firstly,The gesture database of deaf-mute people was made.According to the national standard sign language,in the process of building gesture library,based on Microsoft Kinect camera technology,manual camera method is adopted.In the collection process,a single pure white background is used to avoid interference of complex backgrounds on gesture recognition.In order to build a relatively robust gesture library,10 digital gestures and 30 alphabetic gestures were created,each of which was shot by a different crowd.Then these images are processed based on MATLAB,which are image graying,threshold segmentation,pixel feature extraction,and finally generate label files,which are combined into corresponding data files according to the label files.Secondly,The optimization of DBN structure parameters and the conversion of sign language to Chinese and Tibetan languages are realized.In the training process of the DBN,some structural parameters of the DBN,such as the number of hidden layer nodes,the number of hidden layers,and the learning rate,have a great influence on the gesture recognition rate.Therefore,in the process of using DBN for gesture recognition,the study of its structure contributes to better speech synthesis.The optimal parameters of DBN are obtained from experiments: the number of hidden layers is 3,the number of nodes is 250,150 and 150,and the learning rate is 0.8.After the above experimental process,the structure of CNN is studied.Experiments were made on the factors that can affect the recognition rate of the gesture,such as the learning rate,the number of convolution kernels,or the number of nodes in the convolution kernel,in the CNN,which has a good understanding of the CNN model and is conducive to synthesizing better speech.In the process of Mandarin-Tibetan bilingual speech synthesis,the DNN framework based on deep learning is adopted,and the powerful engine of MATLAB is used for speech synthesis.Finally,the subjective and objective methods are used to evaluate the synthesized speech.The experimental results show that the static gesture recognition rate of the optimal DBN is 98%,the subjective evaluation of the generated Chinese is more than 4,and the evaluation of Tibetan is more than 3.Good results are obtained in the objective evaluation,which shows that the sign language to speech conversion system is feasible.

Keywords/Search Tags:

gesture recognition, deep belief network convolutional neural networks, sign language context-dependent label, mandarin and tibetan speech synthesis

PDF Full Text Request

Related items

1	Research On Sign Language-to-Mandarin/Tibetan Speech Conversion
2	Research On Sign Language-to-Mandarin/Tibetan Emotional Speech Conversion By Combining Facial Expression Recognition
3	Research On Mandarin-to-Tibetan Cross Lingual Speech Conversion Based On Deep Neural Network
4	Research On Statistical Parametric Mandarin-Tibetan Cross-lingual Speech Synthesis
5	Research On Mandarin-Tibetan Cross-lingual Speech Synthesis
6	Research On Hand Gesture Detection In Chinese Sign Language And Sign Language Recognition Based On Neural Network
7	Research On Speech Synthesis Of Dungan Language
8	Research On Neural Network Based Statistical Parametric Speech Synthesis
9	Research On Speech Quality Evaluation For Mandarin-Tibetan Cross-Lingual Speech Synthesis
10	Research And Implementation Of Sign Language Recognition Algorithm Using Deep Learning Networks