Font Size: a A A

Research On Multi–modal Emotional Recognition Based On Audio And Visual

Posted on:2018-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Q HeFull Text:PDF
GTID:2348330533469763Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
Access to more humane,intelligent human-computer interaction experience has been a concern with the rise of artificial intelligence,which makes emotional computing become one of the hot spots.As an important branch of emotional computing research,emotional recognition has developed rapidly in recent years with a bright future.In the emotional expression,the speech and facial expression always contains a lot of emotional information,therefore,extracting the emotional features based on speech and face image,and then recognizing what the emotional belongs is of practical significance.A single speech or face image expressed by the emotional information is not complete,can't meet people's expectations.In order to compensate for the lack of single modal emotion recognition,more and more scholars turn their attention to the study of multi-modal emotion recognition.In this paper,the Surrey Audio-Visual Expressed Emotion(SAVEE)Database,which contains two modal emotional materials,including speech and face images,is used as the standard source data for seven emotions(anger,disgusted,fearful,calm,sad,Surprised)to identify the relevant research,the main research contents are as follows:1)Emotion recognition based on speech.This paper extracts a total of 92 dimensions features including statistical parameters of short-term energy,pitch frequency,speech duration,first three formants,Mel-scale Frequency Cepstral Coeddicients(MFCC)coefficients.After all the sample feature extraction completed,the emotion recognition experiment is performed on the Support Vector Machine(SVM),which leading to a better classification results.2)Emotion recognition based on face image.This paper uses two different methods to extract the image emotion features,which are the Local Binary Pattern(LBP)features of the speech segment peak image and the mean and standard deviation of the facial feature points of the sequence images.After all the sample feature extraction completed,the emotion recognition experiment is carried out on the SVM,and the results of the emotion recognition obtained on different features are compared.Finally,the recognition result obtained by the method of facial feature points feature extraction based on sequence images is better than that based on speech segment peak image LBP feature.3)Multi-modal fusion emotion recognition based on speech and image.In this paper,the feature layer fusion and decision-making layer fusion strategy are used to fuse the speech emotional information and image emotional information respectively,and then the emotion recognition experiment is carried out again.Compared with the results of single-mode emotion recognition,and the results of fusion of feature layer and decision-making layer are compared.It is proved that the multi-modal fusion emotion recognition based on speech and image is better than single-mode emotion recognition,and the decision-making layer fusion performance is better than feature layer fusion.Found that decision-making integration can help improve the recognition accuracy of fear.
Keywords/Search Tags:Emotion recognition, speech feature, facial expression feature, feature layer fusion, decision-making layer fusion, Support Vector Machine(SVM)
PDF Full Text Request
Related items