Font Size: a A A

Speech Emotion Recognition Of Master Thesis

Posted on:2022-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:R M MaoFull Text:PDF
GTID:2518306740979309Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Speech is one of the ways of daily communication,which contains a lot of information,of course,including the emotional information of speech.Nowadays,speech as one of the main ways of human-computer interaction,speech emotion recognition technology has brought great convenience to people in all aspects of life.However,the current speech emotion recognition technology is not mature,there is still a lot of space for research.The purpose of this paper is to study speech emotion recognition on public data sets from two different model types:machine learning and deep learning.From the perspective of machine learning,the public data set EMO-DB is taken as the experimental data set,and the six statistics of energy,ZCR,TEO and MFCC,which are the maximum value,minimum value,average value,variance,mean value of first-order difference and rate of change,are extracted as the original feature set.PCA is used to reduce the dimension,and 80% of the information is retained.Finally,the new special collection composed of the former 42 principal components is used as the input of SVM.Then,it is proved that SVM with radial basis function as kernel function has great difference in sentiment classification results under different values of parameters.In addition,this paper innovatively optimizes the penalty coefficient and Gaussian kernel bandwidth with grid search method,classical genetic algorithm and differential evolution.Finally,the classification results of the three parameter optimization algorithms are compared and advantages and disadvantages of each are pointed out.From the perspective of deep learning,the open data set IEMOCAP is taken as the experimental data set and 3-D log-mels as the input of the model.In this paper,the activation function Re LU in the Resnet18 model is innovatively substituted by Leaky-Re LU for optimization and the optimized Resnet18 is used as the classification model.The Tensor Flow module in Python is used for modeling,and the classification results are compared with the baseline ACRNN.It is proved that the optimized Resnet18 model proposed in this paper improves the average recall rate of sentiment classification on IEMOCAP with 3.02%.Finally,it is compared with the experimental results of SVM based on grid search algorithm to prove the superiority of the improved Resnet18 classification effect and the necessity of using deep learning model to solve the problem of speech emotion recognition.Innovation point of this paper: 1)the grid search method,classical genetic algorithm and differential evolution algorithm,respectively,to punish coefficient of SVM classification model and the gaussian kernel bandwidth optimization and comparison;2)the activation function Re LU in the Resnet18 model is changed to Leaky-Re LU to optimize the model structure,then the optimized Resnet18 is used as the classification model to improve the classification recall rate.
Keywords/Search Tags:classical genetics, differential evolution(DE), 3-D log-mels, improved Resnet18, Tensorflow
PDF Full Text Request
Related items