Font Size: a A A

Research On Emotion Recognition Based On Audio And Video

Posted on:2021-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:C Y XinFull Text:PDF
GTID:2428330611980415Subject:Master of Engineering-Field of Control Engineering
Abstract/Summary:PDF Full Text Request
Emotion recognition technology has a broad application prospect in the fields of medical treatment,education,service and human-computer interaction.As an important research field of artificial intelligence,emotion recognition technology has made great progress in recent years.However,due to the complexity and diversity of emotional state,the expression of individual emotion is influenced by culture and personality.At present,there are still some problems in emotional recognition,such as low recognition rate,poor dynamic recognition effect,and limited application conditions.This paper mainly studies the problem of emotion recognition based on audio and video data.In the study of facial expression recognition based on video,the long short term memory(LSTM)neural network and three-dimensional convolution neural network are tried respectively.This is because LSTM neural network is mostly used to deal with problems with time-sequence data,while 3D convolution neural network can mine the information between image frames.Firstly,preprocess the data,save the intercepted face image,and then extract HOG features and geometric features.LSTM neural network uses HOG features,geometric features and their cascade as inputs.The 3D convolution neural network directly uses the video image to automatically generate complex features,and then carries on the model training.In audio aspect,the method of artificial feature extraction and LSTM neural network is used in audio emotion recognition model.Firstly,the audio data is preprocessed,then features such as short-term zero crossing rate,short-term energy and Mel cepstrum coefficients are extracted using open SMILE tool,and the LSTM network model is constructed and trained.On the basis of neural network models of audio emotion recognition and facial expression emotion recognition,Bayesian fusion method is used to obtain the final emotional state recognition result.In this paper,CHEAVD2.0 database published by Chinese Academy of Sciences is used for experiment according to the above methods.The model based on video data and the model based on audio emotion recognition have different strengths on emotion classification.The experimental results show that the recognition rate of multi-modal fusion is significantly improved.
Keywords/Search Tags:LSTM, Three dimensional convolution neural network(C3D), Multimodal fusion
PDF Full Text Request
Related items