Research On Emotion Recognition Based On Audio And Video

Posted on:2021-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Xin

Full Text:PDF

GTID:2428330611980415

Subject:Master of Engineering-Field of Control Engineering

Abstract/Summary:

PDF Full Text Request

Emotion recognition technology has a broad application prospect in the fields of medical treatment,education,service and human-computer interaction.As an important research field of artificial intelligence,emotion recognition technology has made great progress in recent years.However,due to the complexity and diversity of emotional state,the expression of individual emotion is influenced by culture and personality.At present,there are still some problems in emotional recognition,such as low recognition rate,poor dynamic recognition effect,and limited application conditions.This paper mainly studies the problem of emotion recognition based on audio and video data.In the study of facial expression recognition based on video,the long short term memory(LSTM)neural network and three-dimensional convolution neural network are tried respectively.This is because LSTM neural network is mostly used to deal with problems with time-sequence data,while 3D convolution neural network can mine the information between image frames.Firstly,preprocess the data,save the intercepted face image,and then extract HOG features and geometric features.LSTM neural network uses HOG features,geometric features and their cascade as inputs.The 3D convolution neural network directly uses the video image to automatically generate complex features,and then carries on the model training.In audio aspect,the method of artificial feature extraction and LSTM neural network is used in audio emotion recognition model.Firstly,the audio data is preprocessed,then features such as short-term zero crossing rate,short-term energy and Mel cepstrum coefficients are extracted using open SMILE tool,and the LSTM network model is constructed and trained.On the basis of neural network models of audio emotion recognition and facial expression emotion recognition,Bayesian fusion method is used to obtain the final emotional state recognition result.In this paper,CHEAVD2.0 database published by Chinese Academy of Sciences is used for experiment according to the above methods.The model based on video data and the model based on audio emotion recognition have different strengths on emotion classification.The experimental results show that the recognition rate of multi-modal fusion is significantly improved.

Keywords/Search Tags:

LSTM, Three dimensional convolution neural network(C3D), Multimodal fusion

PDF Full Text Request

Related items

1	Research On Multimodal Emotion Recognition Based On The Fusion Of Temporal And Spatial Features
2	Document Image Classification Based On Multimodal Feature Fusion
3	Research On Automatic Segmentation Method Of Multimodal Image Based On Convolutional Neural Network
4	Research On Key Technologies Of High Performance Accelerator For Convolution And Recurrent Neural Networks
5	Design And Development Of Dangerous Behavior Detection System Based On Multimodal Information Fusion
6	Research On Algorithms For Multimodal Sentiment Analysis Based On Interaction Fusion
7	Multimodal Fusion Based Weakly Supervised Semantic Segmentation Method
8	Research And Application Of Multimodal Feature Fusion Based On Optical Neural Networks
9	Based On Multimodal Feature Emotion Recognition Research
10	Medical Image Segmentation Based On Deep Learning