Font Size: a A A

Research On Speech Emotion Recognition

Posted on:2007-10-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y L LinFull Text:PDF
GTID:1118360185964853Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Emotions play a significant role in human perception and decision making. For a long time, research on emotion intelligence has been done in the fields of psychology and cognitive science. Along with the development of artificial intelligence these years, the combination of emotion intelligence and computer technology brings the novel research area named affective computing. This combination will greatly advance the development of the computer technology. Automatic recognition of human emotion is the first step toward affective computing. Speech as the most important medium of human communication contains lots of emotional information of the speaker, so how to automatically recognize speakers' emotional state is the subject of attention by researchers from different fields. Recent studies on emotion recognition based on acoustic features, especially those studies on mandarin, still have some drawbacks. For example, features which can be widely used, like the MFCC used in speech recognition, have not been found. Furthermore, the recognition rate is not high enough to be widely used in practice. By focusing in the four emotional states: anger, happiness, sadness, surprise and a neutral state which can be always found in daily life, this dissertation studies the speech signal based emotion recognition technique. The main achievements are listed as follows:1. About two hundreds of speech features derived from pitch, short-term energy, formants, MFCC (Mel-Frequency Cesptral Coefficient) and Mel-frequency sub-band energy are studied in recognizing speakers' emotion states. A feature selection and support vector machine (SVM) based recognition approach is proposed. Experimental results show that, by introducing more features besides pitch, short-term energy and formants, performance of the statistic features based recognition system is improved. Furthermore, since the potential feature set is comparatively large, redundancy in it is unavoidable. Thereby feature selection is introduced and the performance of the recognition system is improved while reducing the computational complexity.2. Since the accurate values of some speech features such as pitch or formants are hard to estimate. A new feature vector name Mel frequency energy dynamics coefficients (MFEDC) is proposed. The most advantage of this proposed feature is that the calculation principle is very simple. Experimental results show that the...
Keywords/Search Tags:Speech, Emotion, Emotion Recognition
PDF Full Text Request
Related items