Vocal Emotion Analysis For Mandarin Speech

Posted on:2012-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:M H Sun

Full Text:PDF

GTID:2218330338965354

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

With the rapid development of multi-channel human-machine interaction technology, the study of human-machine interaction pattern has huge potentials in a wide range of applications. As one of the most natural and effective communication way of human being, the speech modal has gained increasingly interests among researchers and industrial developers. Since speech is a significant method of conveying emotions as facial expression does, the study of dealing with emotional speeches has found important realistic values.This paper researches on the key technology of emotional speech processing and focuses on speech emotion recognition on the basis of an emotional mandarin speech database. The utterances of the database are colored by six basic emotions including angry, fear, happy, sad, neutral and surprise. Both the MFCCs and the prosodic parameters are extracted as the feature vectors. To measure the acoustic vibrations of emotional speech compared to the normal ones, a statistical analysis of prosodic features including pitch and energy contour as well as time domain parameters are made at first. A segment-based approach is used to investigate the prosodic features in detail since the voiced and unvoiced sound have different characteristics of emotion features. Compared to the traditional utterance-based approach, the segment-based approach is proved to be more effective in recognizing emotion.Two vocal emotion classifiers, employing GMM or KNN algorithm, are trained and evaluated based on MFCCs and prosodic features respectively. Under optimal parameters, the recognition rate of GMM classifier can reach 72.34%. A segment-based approach is used to represent the emotions of speech and proved to be more effective than the regular utterance-based approach for prosodic feature statistics. The highest recognition rate for this approach using KNN classifier is 64.89%. For the both classifiers, the influence of parameter set is investigated and the setbacks are discussed then.A HTK based emotional speech recognition system is implemented which can recognize the emotion as well as the contents of emotional speech. Phone level HMM modals are built for each emotion to eliminate the distortion of feature vectors of emotional speeches, the overall speech recognition rate is about 50%.This paper concludes all the related work and problems to be solved at the end and the direction of future work on vocal emotion representations and classification algorithms is proposed.

Keywords/Search Tags:

Vocal emotion recognition, Segment-based prosodic features, Emotional speech recognition system

PDF Full Text Request

Related items

1	Research On Key Issues Of Mandarin Speech Emotion Recognition
2	Research On Varying And Clustering Based Emotion Robust Speaker Recognition
3	Study On Speech Emotion Recognition And Its Application
4	Research On Key Techniques Of Speech Emotion Recognition
5	Research On Speech And Emotional Recognition Of Specific Speakers
6	Speech Emotion Recognition Based On Features
7	Local Feature Analysis,Extraction,and Model Validation For Speech Emotion Recognition
8	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
9	Research On Speech Emotion Recognition Technology
10	Research And Implementation Of Gaussian Mixture Model-based Speech Emotion Recognition