Research On Speech Emotion Recognition Method Based On Multi-feature Fusion

Posted on:2022-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2518306323993779

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of artificial intelligence technology,more intelligent human-computer interaction experience has attracted much attention.As one of the most commonly used methods of human-computer interaction,the key to speech is that the machine can fully understand human emotions.Therefore,the research of speech emotion recognition has become a very important task in the field of artificial intelligence research.At present,the research work of speech emotion recognition mainly has the following problems: First,most of the emotional features in existing articles only focus on some acoustic features or semantic features,considering that the emotional features are not sufficient and do not combine the both.Secondly,most of the current methods for fusing features in the field of speech emotion recognition are direct stitching methods,ignoring the divergences between different features.Finally,the existing speech emotion data set has less data,which will lead to overfitting of the speech emotion recognition model.In response to these problems,this thesis has done the following three tasks:(1)Extract multi-emotional features based on acoustic and semantic features.Two types of emotional features are extracted: acoustic features and semantic features.When extracting acoustic features,in order to describe emotional information from different angles,this paper extracts high-level statistical features obtained from(LLDs),uses DNN to extract depth features based on spectral correlation features,and uses CNN to extract depth features based on Filter-bank.When extracting semantic features,the LAS automatic speech recognition model based on the "Encoder-Decoder" framework is used as the semantic features extractor,and the output of the Encoder is learned through BILSTM to higher-level features.(2)A feature level-decision level fusion model based on attention mechanism is proposed.First,the three types of acoustic features are regarded as independent features and semantic features are fused at the feature level,and the method of constructing Huffman trees is introduced to generate feature-level fusion features,and then the fused features are used for speech emotion Recognition.Then,use the decision-level fusion of the weighted voting method to give full play to the advantages of different features,thereby improving the emotion recognition rate.Finally,a feature level-decision level fusion model based on the attention mechanism is proposed,which assigns weights to different results through the attention mechanism,and integrates the results obtained from feature-level fusion and decision-level fusion.(3)Three data augmentation methods for increasing the number of speech emotion data sets are proposed.Due to the small number of existing speech emotion data sets and the subjectivity of speech emotion annotation,the cost of constructing speech emotion data integration is too high.This article expands the data by adding noise,sound length disturbance,and audio correction.The cost can be reduced and the accuracy of speech emotion recognition can be improved.This thesis has achieved a recognition effect of 76.85% on IEMOCAP,and improved 3.85%,which shows the effectiveness of the proposed method.

Keywords/Search Tags:

Speech emotion recognition, Acoustic features, Semantic features, Feature level-Decision level fusion, Data augmentation

PDF Full Text Request

Related items

1	Deep Learning Based Speech Emotion Recognition By Fusing Acoustic Features And Transcriptions Clues
2	The Research Of Speech Emotion Recognition Based On The Fusion Features
3	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
4	Research On Feature Extraction Algorithm Of IMFE And Fusion KELM Recognition Algorithm For Speech Emotion Recognition
5	Research On The Speech Emotion Recognition Fusing Articulatory And Acoustic Features
6	The Research Of Dimensional Speech Emotion Recognition Based On Neural Network And Fusion Features
7	Research On Emotion Recognition Based On Multi-modal Feature Fusion
8	Research On Speech Emotion Recognition Based On Deep Features Fusion And Joint Decision
9	Research Of Acoustic Target Recognition Based On Data Fusion
10	Speech Emotion Recognition With Deep Learning Techniques And Data Augmentation