Font Size: a A A

The Research Of Dimensional Speech Emotion Recognition Based On Neural Network And Fusion Features

Posted on:2019-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhouFull Text:PDF
GTID:2428330545972905Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a primary mode of expression and communication,audio not only conveys what people are going to say,but also contains lots of emotional information from people.It makes speech emotion recognition to be one of the most important researches of intelligent speech information processing.Speech emotion recognition is to help machines find people's affection from speech signal,understand people's emotional thinking,let machines more smarter.Dimensional speech emotion recognition is a new research direction of speech emotion recognition.Dimensional affection uses accurate numbers instead of categories to describe emotions from the multi-dimensional and continuous viewpoints,it works through the fuzziness and finiteness of discrete emotional labels,so that it can present people's affection in a more natural way with the strong emotion describing ability.Currently,the research of dimensional speech emotion recognition has attracted the attention of some scholars.While,the emotional features and recognition algorithms of dimensional speech are still to be improved,the recognition rate is also waiting for the perfect place.Based on the knowledge of traditional discrete emotional classification,this paper is going to start with the recognition of discretized dimensional speech emotions,then go to the dimensional affection.The main work of this paper is as follows:This paper proposed a recognition system based on disfluencies and non-verbal vocalisations features,with bidirectional long short-term memory recurrent neural network model to discretized dimensional speech emotions.This kind of features are proposed with these typical emotional words,and they have the more advantages on quantity than traditional speech emotion features.Beside,the excellent characteristic of bidirectional long short-term memory recurrent neural network is bi-directional learning.The experiments on Audio Video Emotion Challenge(AVEC)2012 database have proved that the classification performance of the proposed method has been improved by more than 10%.A method which combin disfluencies and non-verbal vocalisations features with low-level discriptors features has been proposed to recognize turn-level dimentional affection.Firstly,this article generates the sentence emotions of AVEC2012 according to the word emotions in AVEC2012.Then,the calculation of disfluencies and non-verbal vocalisations features has been refined by using the real length of sentence rather than hypothetical length(the length of senctence is set to 15 words).On the base of these,experiments have been carried out.And the results show that the performance of combing two features is better than the single features.
Keywords/Search Tags:dimensional speech affection recognition, discretized dimensional speech emotions, disfluencies and non-verbal vocalisations features, bidirectional long short-term memory recurrent neural network, low-level descriptors features
PDF Full Text Request
Related items