Research On Emotion Recognition Technology Based On Audio And Visual Perception System

Posted on:2019-01-27

Degree:Master

Type:Thesis

Country:China

Candidate:C G Zhu

Full Text:PDF

GTID:2348330566464278

Subject:Computer Science and Technology

Abstract/Summary:

Sequential,variability and multimodal are three important characteristics of emotion recognition.Starting from these three aspects,emotional features are extracted from speech and expression sequences,and then fused.A multimodal emotion recognition system based on speech and expression images is constructed.Drawing on the information processing mechanism of visual and auditory perception system to make computer emotion recognition closer to the processing way of human brain,and achieve faster and more efficient recognition performance.Contents are as follows:In view of speech,a sub-band energy feature extraction method based on Mel scale wavelet packet decomposition is proposed.Wavelet packet function is used instead of Mel filters to decompose speech signal,then energy is extracted.Speech signal is continuous in time domain,so emotional feature extracted from single frame can only reflect static information of speech.In order to make features better reflect the continuity of time domain,static acoustic features were combined with dynamic difference features,which effectively improves the recognition rate of speech emotion recognition system.According to information processing mechanism of visual perception system,two kinds of expression feature extraction scheme based on dynamic sequence are proposed.In first scheme,DMF_MeanShift algorithm was used to locate the key parts of eyebrow,eye,nose and mouth.Then these key points are calculated by optical flow,and motion features between adjacent frames are obtained.Using these changing timing characteristics can effectively improve the diversity of facial expression.In second scheme,optical flow is calculated between adjacent frames to obtain motion characteristics.At last,the expression recognition system based on improved RNN model is constructed.Expression results can be effectively improved by using the changing time sequence information.Multimodal emotion recognition experiment was carried out by RNN.From two aspects of decision level fusion and feature level fusion,experiments compare the single modal emotion recognition based on voice or expression with the multimodal emotion recognition combined with speech and expression.The results show that recognition rate of multimodal emotion recognition is higher than that of single mode.

Keywords/Search Tags:

Audio and visual perception system, Mel scale, Facial features alignment, Optical flow, Recurrent neural network

Related items

1	Optical Music Recognition Algorithm Combining Multi-scale Residual Convolutional Neural Network And Simple Recurrent Units
2	Research On Video Analysis And Intelligent Diagnosis Of Facial Movement Disorders
3	Facial Action Unit Detection And Micro-expression Analysis
4	Research Of Facial Expression Recognition Based On Image Sequence And Audio Emotion
5	Research On Algorithms For Facial Landmarks Detection
6	Facial Expression Recognition Based On Deep Learning
7	Recurrent Neural Network With Multi-scale For Sentence Classification
8	Research On Video Person Re-identification Based On Deep Learning
9	Research On Age Prediction Method Based On The Combination Of Local Facial Features And Global Facial Features
10	A Research On Video Restoration Algorithm Based On Recurrent Neural Network