Font Size: a A A

Application And Research Of PAD Emotion Model In Speech Emotion Recognition

Posted on:2017-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:J SongFull Text:PDF
GTID:2308330503457518Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Human speech in our everyday life includes not only basic text information, but also full of rich emotional states. In this paper, research focuses on emotional speech recognition on the background of affective computing. PAD emotion model, a theory of continuous emotion, was applied on the basis of the establishment of a natural, real and effective emotional speech database. Hesitant fuzzy information was used to predict PAD values and achieved a quantitative analysis of speech emotion. The main work includes the following studies:1. Four kinds of emotional types were chosen, namely happiness, anger, sadness, surprise. The emotional speech database in the primary stage was intercepted from radio drama. Compared to histrionic emotional speech database, the database had many kinds of emotion, characters, life scenes. The emotional speech was closer to the real life and more practical. It also met the daily habits of expression.2. In order to improve the quality of emotional speech database, a reasonable and effective model was established to evaluate the emotional speech database in the primary stage: First, based on the fuzzy judgment, fuzzy comprehensive evaluation model and system were built from five aspects, that is emotional accuracy, background noise, clarity, naturalness and picture sense. Analytic hierarchy process and entropy method were applied to determine the comprehensive weight. Finally, the final emotional speech database TYUT2.0 was screened and obtained by using this model.3. A new thought of emotional speech features associated with PAD three-dimensional emotion model was proposed. In addition to the traditional discrete emotional theory, namely research of happiness, anger, sadness, surprise four basic emotions, PAD(pleasure, arousal, dominance) three-dimensional model was applied to describe emotional types from a theoretical point of continuous emotional dimension. On the basis of TYUT2.0 speech database, five speech features(Mel Frequency Cepstral Coefficients, Linear Predictor Coefficients, prosodic features, formant frequency and Zero Crossings with Peak Amplitudes) were extracted and applied to emotional speech recognition. The recognition result was mapped to PAD three-dimensional emotional space for the first time. The correlation between PAD three dimensions and different acoustic features was calculated by using Pearson correlation method. The correlation analysis of emotional speech recognition results could be applied in the optimization and adjustment of speech features. It could provide the basis for the next emotional speech recognition based on continuous dimension theory.4. A decision fusion method based on fuzzy information hesitation was proposed. Based on the recognition results of different characteristics, the fusion weights were determined by the correlation coefficients between speech features and the three dimensions of PAD model. The emotional speech could be represented digitally from continuous dimension’s angle by using the similarity of hesitant fuzzy sets. The emotional speech was mapped into the three-dimensional emotion space and the emotional speech recognition based on continuous dimension theory was realized. We could analyze the basic emotions of speech and the reasons for emotional speech recognition’s misjudgments by spatial distribution in pleasure, arousal and dominance.
Keywords/Search Tags:emotional speech recognition, emotional speech database, PAD emotion model, correlation analysis, hesitant fuzzy information
PDF Full Text Request
Related items