Font Size: a A A

Research On Speech-to-gesture Conversion Based On The Keyword Spotting

Posted on:2018-08-01Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhaoFull Text:PDF
GTID:2428330515995573Subject:Intelligent information processing
Abstract/Summary:PDF Full Text Request
In recent years,the number of hearing and speech impaired persons in our country is becoming increasingly large and increasing year by year.They are regarded as a part of the social vulnerable groups.Although the sign language is the main language tool for the deaf mute to normally communicate with each other,it also needs to be popularized and used by the society.Therefore,the utilizable social resources for them are very little on account of suffering the limit of language in the life,entertainment,medical treatment,work and study,etc.With the rapid development of human-computer interaction technology at home and abroad,we need to set up a bridge of communication for the deaf mute,which has become an urgent need.At present,research on gesture synthesis technology has received more and more attention from the researchers in order to change the weak position of hearing and speech impaired persons and help them to better integrate into normal persons' society,but which is lacking of combining speech recognition technology and gesture synthesis technology to realize the research on speech-to-gesture conversion.Thus,in this thesis,the speech information of keywords in the form of sign language is displayed by 3D hand model to realize the overall conversion of speech to gesture by means of research on the process of keyword spotting and analysis of physical structure model of the actual hand on the basis of "Chinese sign language".The main innovations and work of this paper are as follows:Firstly,this thesis builds the sign language model library and keywords corpus.According to the defined gesture language including letters,numbers and common words on the basis of "Chinese sign language",it needs to be created for gesture model library by combining the physical structure analysis of hand model,the establishment of threedimensional space coordinates system,the calculation of bending with each finger bone joints and hand shape similarity,the interpolation calculation of key frame animation,etc.These gestures library are composed of 3D static and dynamic gestures model which are built by means of utilizing 3DS Max modeling tools and various modeling method.Meanwhile,the created gestures model needs to carry on some detailed embellishing operations,such as the texture,color mapping,rendering and so on,to accomplish the establishment process of gesture model library.Then the data files of speech sample are record.These corresponding speech data are taken from designed keywords speech according to the defined sign language.And it needs to process noise reduction and keywords segmentation by means of the audio processing software for speech data,and other operations,to establishing keywords corpus.Secondly,this thesis design and implement research on speech to gesture conversion system based on keyword spotting.It highlights the integral design method of combining keyword recognition and gesture synthesis.It needs to process preprocessing,feature extraction,and other operations,in allusion to the input corpus information of keywords.Meanwhile,it haves to be recognized for speech keywords in corpus information after training the acoustic model based on HMM.Then,gestures are played with open graphics library(Open GL)through combining with the corresponding relation of image – text transformation between gesture model and speech keywords,so as to realize the conversion process of speech to gesture.Thirdly,this thesis evaluates the relevant content of system implementation.The paper uses cross validation method to comparative analysis the results of keyword spotting in the experimental process of multiple keyword recognition;simultaneously,the converted gestures accuracy is made quantitative analysis on the basis of whether the converted gestures can accurately express the meaning of keywords.The test results show that the average recognition rate of keyword spotting is 96.4%,and the MOS average value of converted gestures is 4.4 points,and the standard deviation is 0.3 points.Therefore,the system can better achieve speech to gesture conversion.
Keywords/Search Tags:keyword recognition, gesture modeling, speech to gesture conversion, 3DS Max, HMM, Open GL
PDF Full Text Request
Related items