Font Size: a A A

Research On Speech Emotion Recognition Based On Deep Learning

Posted on:2017-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:C X ZhuFull Text:PDF
GTID:2348330491462752Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Speech is one of ideal ways to human-computer interaction, and carries the speaker's emotional information. The ultimate target of speech emotion recognition is to make meachine to recognine emotions by speech like human and to realize better human-computer interaction, and which will have a broad future. This paper focuses on speech emotion recognition method based on deep learning introducing deep learning into speech emotion recognition algorithms. To further improve the accuracy of speech emotion recognition, this paper also proposes some improved methods. The main works of this paper are as follows:(1) Research background, meanings, history and significance of speech emotion recognition are discussed, and summarizes the research of the definition of emotion, emotional database, emotional features and emotion classifiers.(2) A Chinese emotional speech database has been established for experiments and it contains seven kinds of emotions, including fear, disgust, joy, anger, boredom, neutral, sadness and anger and all of them are tested for validation. Then the speech signals are preprocessed and extracted emotion feature vector including frame energy, zero-crossing, the pitch frequency, energy of sub-space, MFCC and coefficients of frequency of speech etc. Spectrogram, two dimantional form to display speech, is also researched, all these are the basic of the speech emotion recongnition.(3) The basic knowledge of deep learning is discussed, including ANN, softmax and ways to train them, which are the blocks of deep learning discussed later. The basic principles of SDA are discussed, an idea about using SDA to realise dimension reduction of feature vectors is discussed, and the ability of SDA to refine deep features is given special attention. Compared to traditional algorithms of dimension-reduction, the special traits of SDA are proofed by experiments. Besides, to make use of the labels of training examples better, this chapter study a way to further refine emotion-related features (DD-AEF), and experiments show that DD-AEF outperform others features. At last, a new spectral features called SDACC are proposed, compared with HuWSF, which is proved to have better performance.(4) This chapter discusses the basic theory and advantages of CNN, and researches how to use spectrogram as inputs for CNN to realize speech emotion recognition, and four kinds of way to separate spectrogram are discussed and find that dividing it into different patches is the better way. Based on this, we are aware of that more convolution kernels can get better features due to attentions on multi-scale space, so we study a model that utilizing two kinds of convolution kernels to extract feature vectors of speech emotions. Based on the way to extract emotion-salient features, CNN-BN features are proposed, which not only improve the relations between features and target labels, but also reduce the dimensions of features. At last, we discuss the relations between dimensions of CNN-BN features and the recognition rate of emotions.(5) This chapter discuss basic theory of DBN and ways to train them. Like SDA, firstly, we research the advantage and disadvantage between DBN and others algorithms in dimension reduction, and experiments are employed to prove it. Then, DBN is used to extract DBNCC like SDACC. To further get better features about speech emotion, new way is proposed to split energy images into patches which overlaps half at frequency axis, and lastly, experiments prove its excellent performance.The new ideas of this paper are as follow:(1) SDACC is proposed based HuWSF,DBNCC and proved DBNCC;(2) CNN-BN is proposed based on two kinds of convolution kernel of CNN and salient features.
Keywords/Search Tags:speech emotion recognition, stacked denoising autoencoder (SDA), convolutional neutral network (CNN), deep belief network (DBN), deep learning
PDF Full Text Request
Related items