Speech Emotion Recognition Based On Deep Learning And Multi-Feature Fusion

Posted on:2022-12-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Bao

Full Text:PDF

GTID:2518306767977499

Subject:Automation Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of speech emotion recognition,this technology has gradually entered various production scenes,and plays an irreplaceable role in some fields.For example,it can assist the teacher in teaching by monitoring the status of students answering questions in class.And it reminds drivers to drive safely by combining their voices,expressions and behaviors.It can also be used to assist in sentiment analysis of various dialects to help achieve more accurate translation effects.In the process of applying deep learning to the field of speech emotion recognition,most researchers directly build a network model based on a variety of manual feature sets and spectrograms extracted from the original speech.However,few researchers carry out speech emotion recognition directly on the original speech through deep learning.The reason may be that too much information is sampled in a piece of speech.Deep learning network is difficult to extract emotional information effectively.At the same time,in the current research work,many researchers only use manual features and spectrogram features to build models for speech emotion recognition.These manually extracted features will cause the loss of the integrity of the original speech information,which will affect the recognition of speech emotion.In this paper,aiming at the problem that it is difficult to extract effective deep features from the original speech,a feature extraction method based on convolutional neural network is designed by simulating the filter in speech.The parallel combination of one-dimensional convolution and dilated convolution is used to extract the local and global features of the original speech.At the same time,we make the model learn more diverse speech features.Firstly,based on the study of speech spectrogram,we extract the deep semantic information between frequency and amplitude on the spectrogram through the feature extraction method of unsupervised learning.Second,we also design a network model with deep features learned from handcrafted features.The model extracts the temporal information in the hand-crafted features and the information between the features through multi-dimensional learning of the manual features.Finally,we propose a two-stage training strategy of multi-feature fusion.In the first stage,different models are trained separately to the best performance.Then in the second stage,the joint training of feature fusion is carried out to fine-tune the model parameters.A speech emotion recognition model based on the joint action of original speech,spectrogram and manual features is constructed.The experimental data in this paper comes from the IEMOCAP dataset,and the manual feature set extraction is extracted from the e Ge MAPS speech feature set in the Open SMILE toolkit.The multi-feature speech emotion recognition model based on deep learning extracts the deep feature information of speech from the original speech,spectrogram and hand-crafted features,respectively,and achieves 65.3%(Unweighted Accuracy)and 64.0%(Weighted Accuracy)recognition accuracy on the IEMOCAP dataset.It is proved that the combination of original speech,hand-crafted features and spectrogram through deep learning can guide each other to a certain extent to achieve better results.

Keywords/Search Tags:

Speech Emotion Recognition, Unsupervised Learning, Long Short-Term Memory, Attention Mechanism, Multi-Feature Fusion

PDF Full Text Request

Related items

1	Research On Speech Emotion Recognition Based On Spatiotemporal Feature Fusion
2	Speech Emotion Recognition Based On Deep Learning Technology
3	Research And Application Of Speech Emotion Recognition Algorithm Based On Deep Learning
4	Research On Deep Learning-Based Bimodal Emotion Recognition In Open Domain Dialogue Systems
5	Speaker Emotional State Recognition Based On Speech And Text Fusion
6	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
7	Multimodal Emotion Recognition From Speech And Text
8	Speech Emotion Recognition Based On Multi-feature Combination And Attention Mechanism
9	Research On End-to-End Speech Recognition Based On GRU And Self-Attention Mechanism
10	Research On Text Sentiment Analysis Based On Deep Learning