Speech Emotion Recognition Based On Deep Learning

Posted on:2022-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2518306764978149

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

Speech emotion recognition usually refers to the process that which the machine automatically recognizes human emotions from speech.It is widely used in humancomputer interaction systems such as customer service centers,onboard systems,smart speakers,and so on.In recent years,with the increasing demand of industry for the intelligent degree of human-computer interaction systems in the industry,speech emotional recognition has gradually become a research hotspot in the industry.In previous studies,deep learning method is usually used for speech emotion recognition based on convolutional neural network or cyclic neural network.Based on time-delay neural networks and bidirectional encoder representation,this thesis has done the following three parts in the field of speech emotion recognition:(1)Based on the time delay neural network(ECAPA-TDNN),which emphasizes channel attention,propagation and aggregation in TDNN,an ECAPA-TDNN-LSTM model is proposed and applied to speech emotion recognition.On the IEMOCAP dataset,the ECAPA-TDNN-LSTM model achieved a performance of 72.1% for WA and 69.0%for UA.Compared with the CNN benchmark model based on the convolution neural network,it improves the performance by 9.15% for WA and 5.73% for UA.Compared with the ECAPA-TDNN model,it improves the performance by 4.34% for WA and 3.92%for UA.(2)Assuming that the text information given in the IEMOCAP dataset comes from the results of speech recognition,through fine-tuning,the BERT pre-training model based on the bidirectional encoder is applied to the text emotion classification and achieves a performance of 66.5% for WA and 67.6% for UA.(3)Using decision-level fusion,the ECAPA-TDNN-LSTM model presented in work(1)and the BERT pre-training model presented in work(2)are fused to obtain the ETLBERT model.The ETL-BERT model achieves a performance of 80.5% for WA and 79.9%for UA.Compared with the ECAPA-TDNN-LSTM,it improves the performance by 11.65%for WA and 15.80% for UA.Compared with the BERT pre-training model,it improves the performance by 21.05% for WA and 18.20% for UA.

Keywords/Search Tags:

Deep Learning, Speech Emotion Recognition, Model Fusion

PDF Full Text Request

Related items

1	Research On Speech Emotion Recognition Based On Deep Learning
2	Speech Emotion Recognition Based On Deep Learning
3	Research And Implementation Of Speech Emotion Recognition Algorithm Based On Fusion
4	Speech And Facial Double Model Emotion Recognition
5	Research On Speech Emotion Recognition Method Based On Time Series Deep Learning Model
6	Dual Fusion Speech Emotion Recognition Based On Deep Learning
7	Research On Speech Emotion Classifier Based On Deep Learning
8	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
9	Speech Emotion Recognition Based On Deep Learning And Multi-Feature Fusion
10	Speech Emotion Recognition Based On Deep Learning Technology