Font Size: a A A

Researcb Of Emotional Speech Recognition And Synthesis

Posted on:2012-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y SunFull Text:PDF
GTID:1118330371990769Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Emotional speech recognition and synthesis technologies are the hot problems in speech signal processing. The purposess of these technologies are not only making computers understand the emotion state in human speech, but also making them speak as human. And the realization of intelligent man-machine interface can make the communication between men and machine smoothly. Emotional speech recognition is a new approach of speech recognition technology. Emotional speech recognition is a hard promble because of the uncertainty of conception of emotion itself and the fuzzy characteristics of emotional features in speech. There are many researchers study the technology of speech synthesis. But how to build a system with small memery and high naturalness is the emphasis of speech synthesis. This paper focused on the difficulty of emotional recognition and the emphasis of speech synthesis, and the emotional speech recognition features based on human auditory was propsed. Then the emotional speech recognition features based on human auditory was compensated by glottal features. This paper also realized the speech synthesis system baesd on Hidden Markov Model(HMM). And based on above, the parameters of synthesized speech were analyzed. The emotional features were added into synthesized speech to realize the emotional speech synthesis system.The main work and innovative achievements of this dissertation can be concluded as follows:(1) Analyzed the performance of prosody features in different emotion states on the basis of in-depth study of emotion theory. The TYUT emotional speech database which included "Happiness","Anger" and "Neutral" three emotion states, and Mandarin and English two languages was built. The validity of this database was proved by the perceptive experiments and the analyse experiments of typical features.(2) The Zero Crossings with Peak Amplitudes(ZCPA) features were used for emotional speech recognition. The zero crossing features which was represented by frequency and voice rate in ZCPA was combined with the nonlinear energy features in Teager energy operator features. The purpose of the combination was to form a new feature, Zero Crossings with Maximal Teager Energy Operator, ZCMT. The performance of ZCMT was all right in emotional speech recognition.(3) The acoustic model and the auditory model were combined. The new glottal compensation of human auditory model algorithm was proposed by the analysis the influence of glottal features for the human auditory model features. The new algorithm was used into emotional speech recognition experiments. Results showed that the recognition rate of new algorithm was high. And the new algorithem performed well in emotional speech recognition experiments.(4) In fact, the envrioment of conversation is really complex. A kind of merged database experiment was designed to test the database independency of emotional features. From the comparison of merged database experiments, the ZCMT features was found as the lowest database dependency feature of all the features in this paper.(5) For the purpose of synthesis emotional speech, the HMM-based speech synthesis system was built firstly. Then the parameters of synthesized speech were modified. The HMM-based emotional speech synthesis system was realized finally. Synthesized emotional speech was obtained by this system primarily.
Keywords/Search Tags:emotional speech recognition, speech synthesis, emotionalspeech database, human auditory model, glottal feature, emotional speechsynthesis
PDF Full Text Request
Related items