Font Size: a A A

Research On Sign Language-to-Mandarin/Tibetan Speech Conversion

Posted on:2018-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:X C AnFull Text:PDF
GTID:2428330515495578Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Since computer vision-based sign language recognition and speech synthesis technology are important research contents in the field of human-computer interaction respectively,they have been a wide range of applications and have become the two major research hot spots.However,existing studies only do research on sign language recognition technologies and speech synthesis problems separately.Therefore the problem that speech disorders have difficulty in communicating with normal people is not considered because of the lacking of study on sign language-to-speech conversion.The thesis proposes a method of sign language-to-Mandarin/Tibetan speech conversion to solve the communication problem between healthy people and speech disorders.The predefined sign languages are firstly recognized by using the method of depth image technology,deep learning and hidden Markov model(HMM)independently.The text of sign language is then obtained from the recognition results.A context-dependent label for speech synthesis is generated from the recognized text of sign language by a text analyzer.Meanwhile,a HMM-based Mandarin-Tibetan bilingual speech synthesis system is developed by using speaker adaptive training(SAT).The Mandarin speech or Tibetan speech is then naturally synthesized by using context-dependent label generated from the recognized sign languages.The main works and originalities are as follows:Firstly,10 kinds of digital sign languages,30 kinds of alphabetic sign languages and 6 kinds of dynamic sign languages are recognized separately.The sign language areas are firstly extracted from the complex environments by using the depth information.A speeded up robust features(SURF)algorithm is adopted to extract and match the features of sign languages.30 kinds of alphabetic sign languages are then recognized by combining the deep learning with the support vector machine(SVM).The extracted angle values between adjacent trajectories are finally regarded as the sign language features according to the trajectory model of dynamic sign languages.6 kinds of dynamic sign languages are then recognized by adopting the method of HMM.Experimental results show that the predefined sign languages are accurately identified and classified by using these three methods.Secondly,the context-dependent labels of the sign languages are obtained.The semantic of each sign language is firstly expressed in Chinese or Tibetan.Then a Chinese-Tibetan bilingual text analysis is employed to obtain the context information of semantic of each sign language including the initials and finals,syllables,words,prosodic word,prosodic phrase and sentence.Context-dependent labels are generated according to the context information of each sign language and saved to a sign language dictionary for the speech synthesis system.Finally,the thesis realizes sign language to Mandarin-Tibetan bilingual speech synthesis.We select a large Mandarin multi-speaker-based speech corpus and a small Tibetan one-speaker-based speech corpus to train an average mixed-lingual voice model using the SAT.Mandarin speech corpus or Tibetan speech corpus is then used to perform the speaker adaptation transformation to obtain a speaker dependent Mandarin model or Tibetan model.The Mandarin speech or Tibetan speech is finally synthesized.Experimental results show that the average accuracy rate of synthesized speech expressing the sign languages has 3 situations under the condition of known sign languages.The syllables are up to 72.49%,the words account for 86.3%,while the sentences are 96.36%.Meanwhile,in the case of playing synthesized speech and asking the subjects to select the recognized sign languages,the average accuracy rate of recognizing the sign languages counts for 89.87%.
Keywords/Search Tags:sign language recognition, deep learning, context-dependent label, speaker adaptive training, Mandarin-Tibetan bilingual speech synthesis
PDF Full Text Request
Related items