Font Size: a A A

Research On Speech Emotion Recognition Based On Convolution Neural Network Feature Optimization

Posted on:2019-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2428330548971891Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of computers and artificial intelligence,natural human-computer interaction has received extensive attention.Language is the main medium for people's daily communication.In recent years,due to the extensive application of deep learning in the field of speech processing,speech recognition has achieved a good recognition rate,but it still fails to achieve natural human-machine interaction,in part because of emotions in human-computer interaction.In the important role,and the machine can not understand the emotional state of the voice.Therefore,speech emotion recognition for analyzing emotional states through speech signals has important research significance,and this topic has attracted more and more attention.In the study of speech emotion recognition,the selection of emotional characteristics is a key factor in determining the final recognition result.The traditional acoustic features come from the reprocessing of the features of the spectral maps.There are problems that the correlation of neighboring spectral features is ignored due to the frame processing,and the spectral features are not related to the target tags,resulting in the loss of some feature information of the spectral maps.Therefore,this paper proposes to extract the relevant convolution features from the spectrogram through the convolution neural network,fuse the convolution features with the traditional acoustic features,and build a multi-level SVM model based on PCA feature optimization to improve the speech emotion recognition system.To identify performance,the main tasks are as follows:(1)Speech signal preprocessing.After preprocessing such as pre-emphasis of speech signals,windowing and frame detection,and endpoint detection,the acoustic features in the speech signal are extracted,including short-term energy,pitch period,formant,MFCC and its statistical characteristics,as a part of subsequent feature fusion.(2)Spectral map-based feature extraction and fusion of convolution neural networks.In order to avoid the negative impact caused by different voice duration and the problem of insufficient data set samples,this paper first divides the speech signal into odd number of equal-length speech segments and generates corresponding grammar maps.Then construct the convolution neural network,extract the relevant convolution features from the spectrogram,and fuse the convolution features with the traditional acoustic features as the final speech emotion feature.(3)Build a multi-level SVM model based on PCA feature optimization.In order to distinguish similar emotions as much as possible,this paper constructs a multi-level SVM model by calculating the degree of confusion between classes.At the same time,the PCA feature reduction method is used to optimize the dimension of the input characteristics of each decision device to achieve the purpose of optimizing the multi-level SVM model.Through the comparison of existing experiments and existing research data,it is proved that the proposed algorithm based on convolution neural network feature optimization is scientific and effective.
Keywords/Search Tags:speech emotion recognition, spectrogram, convolution neural network, feature fusion, principal component analysis
PDF Full Text Request
Related items