Emotional Speech Conversion And Recognition Based On The Three-dimensional PAD Model

Posted on:2010-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:H Zhou

Full Text:PDF

GTID:2178360278996711

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

Speech signals is the main way of interpersonal communications, which is one of the most ideal one human-machine interactions. Natural voice not only includes the basic linguistics, but also carries the emotion in human life and study, so the study of information has the great significance in emotional speech theoretical and practical. This dissertation aims to research emotional speech recognition and conversion by recording 11 kinds of emotion type of speech data. The dissertation analyses the relationship between emotional speech parameters and PAD, and on this basis, recognizes the emotional speech recognition and transforms speech signal in neutral style to various emotions.The main contributions are:1. The dissertation introduced 3-D PAD (Pleasure/displeasure,arousal/no arousal,dominance/submissiveness) emotion model to represent emotions contained in Chinese speech quantitatively, which pay more attention to the internal composition. Thus, complex and subtle emotions can be represented in the 3-D continuous space, which makes it possible to compute emotions in a quantitative way. The dissertation analyzed the different emotional rhythms characteristic parameters, and the difference in three dimensions and prosodic feature PAD researched the correlation between prosodic feature, the spectrum characteristics and three dimensions PAD. Through the analysis, some conclusions have certain directive significance in future study of emotional speech.2. The dissertation proposed a novel approach for emotional speech conversion based on support vector regression (SVR) method. By analyzing the prosodic features of contrastive neutral and emotional recordings, a support vector regression (SVR) based model is developed, which can transform acoustic features of emotional mean opinion score (EMOS) results demonstrate that the modified speech which achieved 3.4 of score can express emotion.3. The dissertation proposed a novel approach for continuous emotional speech recognition based on Hilbert-Huang transform algorithm--empirical mode decomposition (EMD) and support vector regression (SVR) method. First: emotional speech will be divided into several IMF (intrinsic mode functions) with EMD, and then get useful IMF component, segment these IMF component .Second: extraction some feature from segment-IMF, and constructing feature vector for of the training. Finally, get PA values using SVR predict. Compare with short-term treatment technology, EMD is more suitable for the speech signal processing. Emotional Experiments show that this method can effectively predict PA values.As a new attempt, this dissertation proposed two novel approaches which have certain theoretical basis and the practical effect; it's good for the future study and speech recognition and speech conversion.

Keywords/Search Tags:

emotional speech, 3-D PAD emotion model, support vector regression algorithm, empirical mode decomposition (EMD), speech conversion, speech recognition

PDF Full Text Request

Related items

1	Research On Emotional Speech Based On PAD Three-Dimensional Emotion Model
2	The Research On Feature Extraction And Recognition Of Emotional Speech
3	Speech Emotion Recognition Based On Features
4	Research And Implementation Of Gaussian Mixture Model-based Speech Emotion Recognition
5	Empirical Mode Decomposition Method And Its Research In Speech Recognition Algorithm
6	Research And Implementation Of Speech Emotion Recognition Algorithm Based On Fusion
7	Research On Key Techniques Of Speech Emotion Recognition
8	Application And Research Of PAD Emotion Model In Speech Emotion Recognition
9	The Research Of Speech Emotion Based On Multi-feature Extraction And Fusion
10	Study On Speech Emotion Recognition And Its Application