Font Size: a A A

Emotion Recognition By Speech Signal In Mandarin

Posted on:2008-09-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:S ZhangFull Text:PDF
GTID:1118360212999079Subject:Precision instruments and machinery
Abstract/Summary:PDF Full Text Request
The goal of this thesis is emotion recognition by speech signal in Mandarin, which is to determine the emotional states of a particular speaker from the uttered speech samples. Although many researchers have investigated the possibility of performing emotion recognition by speech and achieved some results, there exists no satisfactory solution yet. Especially the literature on emotion recognition in Mandarin Chinese language is very limited. So, this thesis focuses on the emotion recognition by Mandarin speeches: 1) the Mandarin emotional speech collection, 2) the acoustic analysis, 3) the feature extraction and selection, and 4) the classification method. The emotional states for study are limited to the four emotional behaviors: Anger, Fear, Joy, and Sadness, and supplemented by a neutral state for dissociation from a non-emotional state.The Mandarin emotional speech collection: based on the analysis of some international emotional speech database, we decided the subjects, the speakers, and the kind of the emotions (natural, simulated, elicited). After the perceptual listening test for the collected emotional speech data, the Mandarin emotional speech database is built up for further research.Acoustic analysis of emotional speech: the vocal expressions of human emotions of anger, fear, joy and sadness are acoustically analyzed in relation to neutral speech. The features under investigation include duration, short-time amplitude, as well as pitch at both word and sentence level. The word stress in Mandarin speeches is especially studied with emotion arousal. Based on the analytical results, an overall quantitative measure of the distinctive characteristics of emotional speech will be presented.Feature extraction and selection: based on current literature, 208 acoustic correlates are extracted including pitch, short-time energy, short-time amplitude, signal strength, and duration. However, these 208 features are redundant and computationally inefficient for analysis. Therefore instead of analyzing the features' ability of classifying all the emotions simultaneously as in the traditional methods, our study focuses on investigating the performance of feature to classify each pair of the five emotions. Based on this study, 28 acoustic correlates are selected for our recognition task.The classification method: traditionally, a simultaneous recognition process using the same classification model is used to classify the emotional state of the speaker in addition to its content. However, an analysis on the classification performance for each pair of emotions shows that different features have distinctive classification abilities for different emotions. Therefore utilizing the decision tree theory, an efficient emotion recognition process is proposed. We call it "cascade bisection process" (CB-process), which carries out emotion recognition by means of several bisecting steps and applies different classification models for every step. This process is based on the different ability of features to classify emotions. Through this, we can fully utilize the information extracted from features and achieve a better recognition performance as demonstrated seen from the experimental results.The goal of further improving the emotion recognition performance motivates us to extract more useful features to build feature sets with better distinction ability. Therefore, the critical bands of emotional Mandarin speeches are analyzed and two critical band based features are exploited with respective ability for classifying different emotions. When combining them with the original CB-process, better recognition performance is achieved and demonstrated by the experimental results.Consider the tree structure has the disadvantage that the estimation errors occurred in the first stage will propagate to the next estimation stage and so on. This may lead to unsatisfactory recognition performance of the last decided emotions "Anger and Joy". In order to deal with this problem, the fuzzy theory based CB-process is proposed where the fuzzy theory is applied on both the classification and the decision process. The experimental evaluation shows improved emotion recognition performance.In addition, aiming at building the optimal model at each step in the CB-process, the Boosting algorithm based CB-process is proposed. In the Boosting algorithm training process, those samples that are difficult for classification are given more intensive training, thereby obtaining the optimal classification model. The experimental results showed that the Boosting algorithm based CB-process can achieve improved recognition performance.
Keywords/Search Tags:Emotion recognition by speech signal, Mandarin emotional speech, Acoustic analysis, Mandarin word stress, Analysis between two pair of emotions, Decision tree theory, Cascade Bisection process (CB-process), Critical band analysis, Fuzzy theory
PDF Full Text Request
Related items