Font Size: a A A

Resaerch On Techniques Of Neural Network Based Speech Emotion Recognition

Posted on:2008-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:W J HanFull Text:PDF
GTID:2178360245497750Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Emotions play a significant role in human perception and decision making. Along with the rapid development of artificial intelligence these years, it has great important theoretical and practical significance to improve human-machine interaction by using emotion intelligence. Speech as the most important medium of human communication contains lots of emotional information of the speakers, so how to automatically recognize emotional states from speech is the subject of attention by researchers from related research institutions at home and abroad. Research of speech signal based emotion recognition is developed focusing on the four kinds of emotional states: anger, happiness, sadness and surprise which can be always found in daily life.Firstly, the prosodic and formant features'ability to discriminate emotion states is analyzed based on the emotional corpus constructed previously. Emotion features are all derived from pitch, short-term energy, speech rate and formant. Then, an Elman recurrent network based model is used to recognize emotions from speech. Comparing with the well-used MLP, Elman network has the ability to process temporal features, therefore, it is closer to the continuous hearing mechanism of human's ears. The utterance based global statistic features and the frame based temporal features have been widely used in speech emotion recognition domain, but the rationality of term length they based on is not verified. Therefore, utterance segment based features are extracted and used, furthermore, a concept named"the best segment length for recognition"is proposed. The result of experiments shows that the recognition rates of system and segment length share a strong correlation, and the best system performance is obtained by using segment based features whose segment contains 140 speech frames, moreover, the recognition rate is improved by 4.2% comparing to using global features. Conclusion is achieved by comparing the recognition results obtained by global statistic features and temporal features respectively that those features tend to reflect different aspects of emotions. Finally, a novel model named Global Control Elman is advanced to combine the two kinds of features together. The accuracy obtained by feature combination is higher than that obtained by each kind of features alone, reaching a maximum of 66.0%. Research work described above makes a good foundation for delving into speech emotion recognition technology on later stage.
Keywords/Search Tags:Speech Emotion Recognition, Acoustic Feature, Artificial Neural Network, Elman Network
PDF Full Text Request
Related items