Font Size: a A A

Emotional Speech Conversion And Recognition Based On The Three-dimensional PAD Model

Posted on:2010-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhouFull Text:PDF
GTID:2178360278996711Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Speech signals is the main way of interpersonal communications, which is one of the most ideal one human-machine interactions. Natural voice not only includes the basic linguistics, but also carries the emotion in human life and study, so the study of information has the great significance in emotional speech theoretical and practical. This dissertation aims to research emotional speech recognition and conversion by recording 11 kinds of emotion type of speech data. The dissertation analyses the relationship between emotional speech parameters and PAD, and on this basis, recognizes the emotional speech recognition and transforms speech signal in neutral style to various emotions.The main contributions are:1. The dissertation introduced 3-D PAD (Pleasure/displeasure,arousal/no arousal,dominance/submissiveness) emotion model to represent emotions contained in Chinese speech quantitatively, which pay more attention to the internal composition. Thus, complex and subtle emotions can be represented in the 3-D continuous space, which makes it possible to compute emotions in a quantitative way. The dissertation analyzed the different emotional rhythms characteristic parameters, and the difference in three dimensions and prosodic feature PAD researched the correlation between prosodic feature, the spectrum characteristics and three dimensions PAD. Through the analysis, some conclusions have certain directive significance in future study of emotional speech.2. The dissertation proposed a novel approach for emotional speech conversion based on support vector regression (SVR) method. By analyzing the prosodic features of contrastive neutral and emotional recordings, a support vector regression (SVR) based model is developed, which can transform acoustic features of emotional mean opinion score (EMOS) results demonstrate that the modified speech which achieved 3.4 of score can express emotion.3. The dissertation proposed a novel approach for continuous emotional speech recognition based on Hilbert-Huang transform algorithm--empirical mode decomposition (EMD) and support vector regression (SVR) method. First: emotional speech will be divided into several IMF (intrinsic mode functions) with EMD, and then get useful IMF component, segment these IMF component .Second: extraction some feature from segment-IMF, and constructing feature vector for of the training. Finally, get PA values using SVR predict. Compare with short-term treatment technology, EMD is more suitable for the speech signal processing. Emotional Experiments show that this method can effectively predict PA values.As a new attempt, this dissertation proposed two novel approaches which have certain theoretical basis and the practical effect; it's good for the future study and speech recognition and speech conversion.
Keywords/Search Tags:emotional speech, 3-D PAD emotion model, support vector regression algorithm, empirical mode decomposition (EMD), speech conversion, speech recognition
PDF Full Text Request
Related items