Font Size: a A A

Research On Emotion Recognition Technology Based On Audio And Visual Perception System

Posted on:2019-01-27Degree:MasterType:Thesis
Country:ChinaCandidate:C G ZhuFull Text:PDF
GTID:2348330566464278Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Sequential,variability and multimodal are three important characteristics of emotion recognition.Starting from these three aspects,emotional features are extracted from speech and expression sequences,and then fused.A multimodal emotion recognition system based on speech and expression images is constructed.Drawing on the information processing mechanism of visual and auditory perception system to make computer emotion recognition closer to the processing way of human brain,and achieve faster and more efficient recognition performance.Contents are as follows:In view of speech,a sub-band energy feature extraction method based on Mel scale wavelet packet decomposition is proposed.Wavelet packet function is used instead of Mel filters to decompose speech signal,then energy is extracted.Speech signal is continuous in time domain,so emotional feature extracted from single frame can only reflect static information of speech.In order to make features better reflect the continuity of time domain,static acoustic features were combined with dynamic difference features,which effectively improves the recognition rate of speech emotion recognition system.According to information processing mechanism of visual perception system,two kinds of expression feature extraction scheme based on dynamic sequence are proposed.In first scheme,DMF_MeanShift algorithm was used to locate the key parts of eyebrow,eye,nose and mouth.Then these key points are calculated by optical flow,and motion features between adjacent frames are obtained.Using these changing timing characteristics can effectively improve the diversity of facial expression.In second scheme,optical flow is calculated between adjacent frames to obtain motion characteristics.At last,the expression recognition system based on improved RNN model is constructed.Expression results can be effectively improved by using the changing time sequence information.Multimodal emotion recognition experiment was carried out by RNN.From two aspects of decision level fusion and feature level fusion,experiments compare the single modal emotion recognition based on voice or expression with the multimodal emotion recognition combined with speech and expression.The results show that recognition rate of multimodal emotion recognition is higher than that of single mode.
Keywords/Search Tags:Audio and visual perception system, Mel scale, Facial features alignment, Optical flow, Recurrent neural network
PDF Full Text Request
Related items