Font Size: a A A

Research On Algorithms Of Speech Emotion Recognition Based On Feature Learning

Posted on:2018-11-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ZhaFull Text:PDF
GTID:1368330545461050Subject:Information and Signal Processing
Abstract/Summary:PDF Full Text Request
As an important direction of emotional computing,speech emotion recognition can effectively promote the natural and harmonious human-computer interaction.Thus more and more researchers have paid attention to speech emotion recognition.Although the great progress has made in the speech emotion recognition in recent decades,the following problems still exist.Firstly,there is an uncertainty in the definition of speech emotion model.There are two main types of speech emotion model definition.One is the speech emotion model of the discrete categories,the other is the speech emotion model of the continuous dimensions.For different speech emotional datasets,different emotion models may be used by the different researchers.Therefore,the selection of emotion model is one of the challenges for emotional computing.Secondly,feature learning is the key to improve the performance of speech emotion recognition.With the development of machine learning,researchers have put forward to many new algorithms for the feature learning.Therefore,it is a key problem to extract more effective features by using the novel machine learning algorithms,which may improve the emotion recognition performance.To address the above mentioned problems and improve the emotion recognition performance,we studied the recognition algorithms of speech emotion information which are based on the feature learning.The main contents of this paper are as follows:(1)The paper proposes a multi-level acoustic feature fusion algorithm based on multiple kernel learning.Furthermore,this paper proposes the transfer multiple kernel learning algorithm based multi-level feature fusion,which can improve the recognition robustness of the feature in the training and the test dataset.As we known,the utterance-level features have common used in speech emotion recognition.However,sub-utterance-level features may provide additional speech emotion recognition information for the utterance-level ones.Currently,the emotion recognition algorithms for multi-level acoustic features are mostly based on the strategies of feature cascade or decision fusion.Therefore,these algorithms fail to take full account of the intrinsic relationship among multi-level acoustic features and to effectively exploit the emotion information from different level acoustic features.To effectively fuse the emotion information from different level acoustic features,we study the recognition performance consistency and uniqueness of multi-level acoustic features in the multiple-kernel-induced Hilbert space.In additional,in order to solve the mismatch of the multi-level acoustic features on the training and the test datasets,we propose a novel algorithm of transfer multiple kernel learning,which is used for fusing multi-level acoustic features and called MFF-TMKL(Transfer Multiple Kernel Learning of Multi-level Features Fusion).The algorithm includes two constraints:Firstly,through studying the consistency and uniqueness of the multi-level acoustic feature recognition performance,we fuse the emotional information from multi-level acoustic feature in the multiple-kernel-induced Hilbert space.Secondly,to reduce the distribution mismatch of the multi-level acoustic feature on the training and test datasets,the maximum mean difference is extended from the single kernel space to the multiple kernel space by the geometric interpretation of the multiple kernel learning.Finally,the two constraints are used to optimize the objective function of traditional multiple kernel learning.In order to verify the validity of the proposed algorithm,the start/end and voiced/unvoiced segmentation of each sentence were carried out respectively,and the validity of the proposed MFF-TMKL algorithm was verified on the Ohm and Mont datasets of the Aibo speech emotional corpus.(2)The paper proposes a dimension speech emotion recognition algorithm based on multi-label deep neural network.We point out that the existing problems for dimensional speech emotion recognition algorithms.The specific problems are described in the following:Firstly,the algorithms may ignore the different discriminative ability of the same speech features for the different dimensional emotion labels,or can not effectively extract the emotional information by the liner and shallow feature learning.Secondly,the current algorithms may regard the dimension reduction as the preprocessing step of speech emotion recognition.Coupled speech emotion feature learning and its classification model may be more conducive to improve the recognition performance of speech emotion.In order to avoid the above-mentioned problems,we propose a novel dimension speech emotion recognition algorithm based on multi-label deep neural network.The learning of the deep network consists of two steps:Firstly,in order to exploit the correlation information among the dimension emotion labels,the graph matching is applied to multiple dimension emotion labels simultaneously during learning the top layer of this neural network.Furthermore,in order to avoid the fact that the graph matching can only extract the deep feature of the same dimension as the dimension emotion labels,the deep feature learning is coupled with the least squares regression model with the transformation matrix.Secondly,taking into account the recognition performance differences of the same speech features for different dimensional emotional labels,then we build the label-specific network layer,which makes the graph matching and least squares regression only for the specified dimensional emotion label in the process of the deep feature learning.In additional,there are weakly-shared parameters between the label-specific learning layer and the top layer of multi-label deep neural network for exploiting the information of multiple dimensional labels.In order to verify the validity of the proposed algorithm,the simulation experiments were carried out in 2D(Arousal-Valence)and 3D(Arousal-Valence-Power)dimension emotion space,and we selected the speech data of the AVEC2012 and IEMOCAP emotional databases to verify the performance of the proposed algorithm.(3)The paper proposes a novel speech emotion recognition algorithm based on the deep recurrent Restricted Boltzmann Machine.When the object of speech emotion recognition is a long-term speech with emotion change,the speech emotion recognition algorithm can not track or identify the changed speech emotion due to the long-term statistical characteristics of the speech emotion feature vectors.In fact,the contextual information has an important impact on the speaker's emotional state.Therefore,how to use emotion-related context information to conduct feature learning is a key problem to be solved.On the other hand,how to effectively learn the supervised deep network is one of the hotspots of deep feature learning.In this paper,we propose a speech emotion recognition algorithm based on deep recurrent restricted Boltzmann machine,which is based on the above two hot issues and the effective use of supervised deep network to dynamically study the characteristics of emotional change.Based on the voiced detection technique,the algorithm uses two Gaussian-Bernoulli conditional Restricted Boltzmann Machines to extract the high-level emotional information from the emotional labels and speech features of training set,respectively.Then we use the recurrent neural network to map the dynamic relationship between the extracted high-level emotional information from label and speech features.Subsequently,we stack the Conditional Restricted Boltzmann Machine of speech features,recurrent neural network and the Restricted Boltzmann Machine of emotional labels from bottom to top,and finely tune the network parameters of top layer using the validation set.The recognition performance of the proposed algorithm is verified on the speech data of the AVEC2012 and IEMOCAP emotional databases.(4)The paper proposes a novel speech emotion recognition algorithm based on the ant colony search strategy in the emotional data field.Although the current speech emotion recognition algorithms achieve the relatively high recognition rate,these algorithms are not very good application to the real-life speech emotion recognition systems.There are three reasons:Firstly,for the emotional database of Denmark or Berlin,the speech data are collected by the acting way,and belong to the typical emotions,such as happy,anger,scared,sad and so on.Secondly,there is time continuity for speech data,so the speech emotion recognition system needs to process the long-term speech,such as the telephone service system.Thirdly,the current speech emotion recognition algorithms can not effectively exploit the priori information of the emotional change,and this prior information will be influenced by the speaker's personality or cultural background and is carried by the labeled training data.To address the above mentioned issues,a novel speech emotion recognition algorithm,based on the combination of emotional data field(EDF)and ant colony search(ACS)strategy,is proposed and called the EDF-ACS algorithm.More specifically,the EDF uses the potential function to model the inter-relationship among the turn-based acoustic feature vectors.To perform speech emotion recognition,the artificial colony is used to mimic the turn-based acoustic feature vectors.Then,the canonical ACS strategy is used to investigate the movement direction of each artificial ant in the EDF,which is regarded as the emotional label of the corresponding turn-based acoustic feature vector.The proposed EDF-ACS algorithm is evaluated on the AVEC 2012 dataset,which contains the speech emotion data of spontaneous,non-prototypical and long-term.The experimental results show that the proposed EDF-ACS algorithm outperforms the existing state-of-the-art algorithms in turn-based speech emotion recognition.
Keywords/Search Tags:speech emotion, multiple kernel learning, deep neural network, Restricted Boltzmann Machine, ant colony algorithm, data field
PDF Full Text Request
Related items