Font Size: a A A

A Study On Recognition Of Emotions In Speech

Posted on:2008-09-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:X C JinFull Text:PDF
GTID:1118360212999060Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Speech is one of the most convenient means of communication between people and it is one of the fundamental methods of conveying emotion as well as semantic information. Moreover, emotion plays an important role in communication. So emotion information processing in speech signals has gained increasing attention during the last few years as the need for machines to understand human well in human-machine interaction has grown. Being one of the most branchs of emotion information processing in speech, emotion recognition in speech is the fundemental of the nature human-machine communication. However, the research about the human emotion is still at the exploratory stage. There is still no acknowledged definition of human emotion. And emotion has strong social and culture characteristics. On the other hand, speech signals contain complex information. All of these factors are great challenges for emotion recognition in human speech, which is in its infancy.In order to establish a speaker independent speech emotion recognition system without getting any profit from context or linguistic information, this paper focuses on emotional speech corpus establishment, acoustic features extraction of speech, analysis and selection of emotional features, emotion dimension space, emotion modeling and emotion recognition. Based on the analysis of adequate number of emotional speech samples, two methods of emotion modeling are presented in this paper, which provide a theoretical and technical framework for emotion recognition in spoken language. Base on these studies, two emotion recognition algorithms are accomplished and a speaker and content independent Mandarin emotion recognition system is completed.The innovative points and main contributions of this paper are as follows:(1) An algorithm based on the modified cepstrum is presented for the estimation of the fundamental frequency (F0) of speech signals. Voicing decisions are made using a decision function which is composed of cepstral peak, zero-crossing rate, and energy of short-time segments of speech signals. An accurate voiced/unvoiced classification is obtained based on this decision function. Then a dynamic programming method is used to realize pitch tracking. The consecution of F0 is considered sufficiently in the cost function. The proposed algorithm can avoid the problem of pitch doubling and pitch halving effectively, as well as preserve the legitimate doubling and halving of F0. And the algorithm has some desirable advantages such as high accuracy and smooth F0 contour, which needs no further smoothing.(2) This paper analyzes the relationships between emotion states and speech acoustic features, including prosody and voice quality. The shortage of short-time energy on distinguishing emotion states is pointed out in this paper. On the oterh hand, we find that the proportion of energy below 250Hz to the whole is one of the potential choices for emotion recognition in speech. And the characters of the pitch contour and pitch derivative are analyzed for the purpose of emotion recognition. At the same time, the differences of emotional acoustic features between male speech and female speech are found out and a gender distinguish method is developed based on these findings. In this method, the mean, range and variance of F0 are used as features and Fisher linear discriminant function is used to distinguish male speech and female speech. Experimental results show that the proposed method gains a high accuracy.(3) A conception of an emotion space model based on the results from psychological research is presented and a perceptual experiment is reported. In the experiment, we have studied how the six basic emotions of Mandarin in the emotion space. Furthermore, we have studied the relationships between the prosodic and quality features and the mean ratings in the two dimensional space of arousal and valence.(4) From the point of view for emotion modeling, the paper uses emotion field and emtional potency to describe the emotion space, by introducing the conception of data field and potential function into the emotion modeling. Through this method, any emotion in the emotin space can be seen as the composite of all basic emotions in this research. The contribution of each basic emtion to the emotion is determined by the emotional potency which is formed by the former in the later. The center of each basic emotion is searched by hill climbing algorithm. The emotion recognition algorithm based on this model performs well than the traditional methods.(5) A dimension based emtoin model is presented according to the relationships between the acoustic features of speech and emotion dimensions. In this modeling method, prosodic features are used to construct the statistic arousal models and quality features are used to contruct the statistic valence modesl. Then the probability outputs of all these dimension models are considered as the features to establish the emotion category models. GMM is selected to construct the emotion dimension models and a new algorithm for the estimation of the GMM's origin parameters is proposed based on clustering method. SVM is used to establish the emotion catergory models. Experimental results indicate that the emotion recognition algorithm based on this model gains the better performance than the emotion field method.The two emotion modeling methods proposed in this paper, which are with scientific foundations and good performances, provide a direction for the future work of emotion recognition in spoken language.
Keywords/Search Tags:Affective computing, emotion recognition, fundamental frequency estimation, emotion dimension, emotion space, emotion field, emotion modeling
PDF Full Text Request
Related items