Research On Emotion Recognition Based On Multi-feature Fusion Of Video And Audio

Posted on:2020-07-12

Degree:Master

Type:Thesis

Country:China

Candidate:J H Fan

Full Text:PDF

GTID:2428330575963952

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence technology in the past few decades,emotion recognition has attracted the attention of more and more researchers.Computers can only be truly intelligent if they have human emotions,and emotion recognition is the primary consideration.In this paper,a new feature descriptor is proposed,and the fusion of multiple visual features,combined with discriminative audio features,significantly improves the effect of video and audio emotion recognition.The main research of this paper is as follows:1.Most of the video emotion recognition is based on static images,lacking temporal information.This paper proposes a new feature descriptor,the spatiotemporal multi-valued weber local descriptor(STMWLD).STMWLD can not only extract spatial and temporal information,but also refine texture information,and fuses complementary global features(CNN,Gist)and local features(LBP,STMWLD).The fusion framework of kernel entropy component analysis(KECA+DMCCA)under discriminative multiple canonical correlation analysis not only achieves the effective fusion of features,but also significantly reduces the redundant information.It is proved that the single feature often cannot fully describe facial feature information,and it is necessary to fuse different complementary feature to effectively improve the accuracy of emotion recognition.2.The self-built natural expression video database.At present,the recognition of facial expression is mostly in the experimental research stage.The expression in the actual natural scene are not limited to the six expression in the standard database,and the expression recognition in the real scene is susceptible to various complicated factors.In order to more accurately reflect the effect of video emotion recognition in real and complex natural scenes,this paper makes a self-built video emotional database,using the multi-feature fusion method proposed above,the accuracy of emotion recognition on the self-built database reaches 55.45%.3.Bimodal emotion recognition.In order to improve the accuracy of emotion recognition,this paper adopts the bimodal emotion recognition based on the combination of audio-visual features.The above complementary features are used as visual features,and the most discriminative 25 prosodic features and MFCCs are used as audio features.Due to the difference between visual features and audio features,this paper uses the multi-kernel support vector machine classifier(MKL-SVM)to solve this problem.The final experiments were carried out on the standard databases RML and SAVEE,respectively.The experimental results show that the emotion recognition effect of the bimodal multi-feature fusion is significantly better than that of the single mode,and the visual and audio emotion recognition rates reach 78.82% and 87.64%,respectively,the accuracy of emotion recognition in video is further improved.

Keywords/Search Tags:

audio-visual emotion recognition, multi-feature fusion, STMWLD, KECA+DMCCA, MKL-SVM

PDF Full Text Request

Related items

1	Research On Multi�modal Emotional Recognition Based On Audio And Visual
2	Audio-visual Emotion Recognition Based On Deep Learning And Backtracking Comparison
3	Virtual Reality Emotion Recognition Based On Weight Of Panoramic Multi-view Regions And Audio-visual Fusion
4	Emotion Recognition Based On The Fusion Of Expression Feature And Audio Feature
5	Research On Emotion Recognition Based On Multi-modal Feature Fusion
6	Research On Feature Fusion Method Of Speech Emotion Recognition Based On Deep Learning
7	Speech Endpoint Detection Based On Audio And Visual Features
8	Research On Emotion Recognition Of Monomodal Speech And Multimodal Speech Vision Based On Transfer Learning
9	Research On Speech Emotion Recognition Technology Based On Multi Classification And DBN Based Onmulti Feature Fusion
10	Multi-speaker Recognition Based On Audio-video Feature Fusion In Smart Environment