Resaerch On Techniques Of Neural Network Based Speech Emotion Recognition

Posted on:2008-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:W J Han

Full Text:PDF

GTID:2178360245497750

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Emotions play a significant role in human perception and decision making. Along with the rapid development of artificial intelligence these years, it has great important theoretical and practical significance to improve human-machine interaction by using emotion intelligence. Speech as the most important medium of human communication contains lots of emotional information of the speakers, so how to automatically recognize emotional states from speech is the subject of attention by researchers from related research institutions at home and abroad. Research of speech signal based emotion recognition is developed focusing on the four kinds of emotional states: anger, happiness, sadness and surprise which can be always found in daily life.Firstly, the prosodic and formant features'ability to discriminate emotion states is analyzed based on the emotional corpus constructed previously. Emotion features are all derived from pitch, short-term energy, speech rate and formant. Then, an Elman recurrent network based model is used to recognize emotions from speech. Comparing with the well-used MLP, Elman network has the ability to process temporal features, therefore, it is closer to the continuous hearing mechanism of human's ears. The utterance based global statistic features and the frame based temporal features have been widely used in speech emotion recognition domain, but the rationality of term length they based on is not verified. Therefore, utterance segment based features are extracted and used, furthermore, a concept named"the best segment length for recognition"is proposed. The result of experiments shows that the recognition rates of system and segment length share a strong correlation, and the best system performance is obtained by using segment based features whose segment contains 140 speech frames, moreover, the recognition rate is improved by 4.2% comparing to using global features. Conclusion is achieved by comparing the recognition results obtained by global statistic features and temporal features respectively that those features tend to reflect different aspects of emotions. Finally, a novel model named Global Control Elman is advanced to combine the two kinds of features together. The accuracy obtained by feature combination is higher than that obtained by each kind of features alone, reaching a maximum of 66.0%. Research work described above makes a good foundation for delving into speech emotion recognition technology on later stage.

Keywords/Search Tags:

Speech Emotion Recognition, Acoustic Feature, Artificial Neural Network, Elman Network

PDF Full Text Request

Related items

1	Emotion Speech Recognition Based On Artificial Neural Network
2	Research And Implementation Of Speech Emotion Recognition Based On Feature Selection And Confusion
3	Research On Speech Emotion Recognition Based On The Fusion Of ANN And GMM
4	Research On Key Technologies Of Speech Emotion Recognition
5	Neural Network-based Chinese Speech Emotion Recognition
6	Mandarin Speech Emotion Recognition Based On HMM And Artificial Neural Network Hybrid Model
7	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
8	Research On Key Issues Of Mandarin Speech Emotion Recognition
9	Speech Emotion Recognition Based On Deep Learning
10	Research On Feature Extraction And Classification Of Speech Emotion