Font Size: a A A

Vocal Emotion Analysis For Mandarin Speech

Posted on:2012-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:M H SunFull Text:PDF
GTID:2218330338965354Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of multi-channel human-machine interaction technology, the study of human-machine interaction pattern has huge potentials in a wide range of applications. As one of the most natural and effective communication way of human being, the speech modal has gained increasingly interests among researchers and industrial developers. Since speech is a significant method of conveying emotions as facial expression does, the study of dealing with emotional speeches has found important realistic values.This paper researches on the key technology of emotional speech processing and focuses on speech emotion recognition on the basis of an emotional mandarin speech database. The utterances of the database are colored by six basic emotions including angry, fear, happy, sad, neutral and surprise. Both the MFCCs and the prosodic parameters are extracted as the feature vectors. To measure the acoustic vibrations of emotional speech compared to the normal ones, a statistical analysis of prosodic features including pitch and energy contour as well as time domain parameters are made at first. A segment-based approach is used to investigate the prosodic features in detail since the voiced and unvoiced sound have different characteristics of emotion features. Compared to the traditional utterance-based approach, the segment-based approach is proved to be more effective in recognizing emotion.Two vocal emotion classifiers, employing GMM or KNN algorithm, are trained and evaluated based on MFCCs and prosodic features respectively. Under optimal parameters, the recognition rate of GMM classifier can reach 72.34%. A segment-based approach is used to represent the emotions of speech and proved to be more effective than the regular utterance-based approach for prosodic feature statistics. The highest recognition rate for this approach using KNN classifier is 64.89%. For the both classifiers, the influence of parameter set is investigated and the setbacks are discussed then.A HTK based emotional speech recognition system is implemented which can recognize the emotion as well as the contents of emotional speech. Phone level HMM modals are built for each emotion to eliminate the distortion of feature vectors of emotional speeches, the overall speech recognition rate is about 50%.This paper concludes all the related work and problems to be solved at the end and the direction of future work on vocal emotion representations and classification algorithms is proposed.
Keywords/Search Tags:Vocal emotion recognition, Segment-based prosodic features, Emotional speech recognition system
PDF Full Text Request
Related items