Research On Speech Emotion Recognition Based On Multimodal Information Fusion

Posted on:2022-06-14

Degree:Master

Type:Thesis

Country:China

Candidate:D L Jiang

Full Text:PDF

GTID:2518306482455114

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Communication is the key way for human beings to express their thoughts.Among all the ways of communication,language is the most popular and effective way of communication.Nowadays,IOT applications are developing more rapidly.These applications range from simple wearable devices or small parts to complex autonomous vehicles and a variety of automation devices,which bring great convenience to people's daily life.The intelligent application is interactive,which requires users to carry out certain specific operation instructions to use.The main way to realize the application is to make the intelligent device play a role through voice input.Language perceptron can detect the speaker's gender,age,language type,emotion and other information,which creates the necessary conditions for computer applications to understand human language.In order to analyze the speaker's emotional state,many applications use the existing speech recognition system and emotion detection system at the same time.The performance index of emotion detection system can reflect the usage status of IOT application,and provide better improvement methods based on this.Improving the multi-modal fusion mechanism is a decisive factor to improve the performance of the emotion recognition system.Most existing multi-modal emotion recognition systems just cascade the features extracted from different modalities.The main problem that this method faces in traditional classification algorithms is that the information carried by different modalities will be generated.Problems such as information conflict and redundancy.In addition,the method of concatenating the eigenvectors of different modalities to form a high-dimensional eigenvector will ignore the implicit correlation between the modalities.The current primary task is to minimize the impact of information conflict and redundancy in the audio and visual modalities on the multi-modal emotion recognition system.In response to the above problems,this research proposes a new hybrid fusion method that combines audiovisual content and user comment text.This method uses the latent space plane feature-level fusion method to fuse audio and visual signals,and calculates the correlation between the two modalities to remove redundant features,and then uses DS evidence theory to fuse audiovisual and text modalities.This method solves the problem of information redundancy and conflict in audio and video.In our proposed method,Marginal Fisher Analysis(MFA)is introduced and compared with Cross Modal Factor Analysis(CFA)and Canonical Correlation Analysis(CCA)methods.The experimental results show that our method has better performance.Although there have been some similar studies to solve the redundancy problem in feature-level fusion by maintaining the statistical correlation between modes,they have not been applied to decision-level fusion.In other words,the existing methods either use feature-level latent space plane fusion methods,or use evidence theory methods to fuse audiovisual and text modalities.Experiments with the DEAP data set show that this method is superior to ordinary decision-level fusion and non-latent spatial plane fusion.In addition,compared with cross-modal factor analysis(CFA)and canonical correlation analysis(CCA),edge Fisher analysis(MFA)has a better effect on feature-level audio-visual fusion.

Keywords/Search Tags:

speech emotion recognition, decision level fusion, latent space plane, multimodality, Dempster-Shafer

PDF Full Text Request

Related items

1	Research On Feature Extraction Algorithm Of IMFE And Fusion KELM Recognition Algorithm For Speech Emotion Recognition
2	Research On Emotion Recognition Based On Multi-modal Feature Fusion
3	Research On Speech Emotion Recognition Method Based On Multi-feature Fusion
4	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
5	The Research Of Speech Emotion Recognition Based On The Fusion Features
6	Deep Learning Based Speech Emotion Recognition By Fusing Acoustic Features And Transcriptions Clues
7	Research And Application Of Speech Emotion Recognition Technology Based On Feature Fusion
8	Research On Speech Emotion Recognition Based On Feature And Decision Fusion
9	Research And Implementation Of Speech Emotion Recognition Algorithm Based On Fusion
10	Research Of Emotion Recognition In Speech Based On Multiple Classifiers Fusion Of SVM And CRF