Font Size: a A A

Research On Emotion Recognition Based On Multi-feature Fusion Of Video And Audio

Posted on:2020-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:J H FanFull Text:PDF
GTID:2428330575963952Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of artificial intelligence technology in the past few decades,emotion recognition has attracted the attention of more and more researchers.Computers can only be truly intelligent if they have human emotions,and emotion recognition is the primary consideration.In this paper,a new feature descriptor is proposed,and the fusion of multiple visual features,combined with discriminative audio features,significantly improves the effect of video and audio emotion recognition.The main research of this paper is as follows:1.Most of the video emotion recognition is based on static images,lacking temporal information.This paper proposes a new feature descriptor,the spatiotemporal multi-valued weber local descriptor(STMWLD).STMWLD can not only extract spatial and temporal information,but also refine texture information,and fuses complementary global features(CNN,Gist)and local features(LBP,STMWLD).The fusion framework of kernel entropy component analysis(KECA+DMCCA)under discriminative multiple canonical correlation analysis not only achieves the effective fusion of features,but also significantly reduces the redundant information.It is proved that the single feature often cannot fully describe facial feature information,and it is necessary to fuse different complementary feature to effectively improve the accuracy of emotion recognition.2.The self-built natural expression video database.At present,the recognition of facial expression is mostly in the experimental research stage.The expression in the actual natural scene are not limited to the six expression in the standard database,and the expression recognition in the real scene is susceptible to various complicated factors.In order to more accurately reflect the effect of video emotion recognition in real and complex natural scenes,this paper makes a self-built video emotional database,using the multi-feature fusion method proposed above,the accuracy of emotion recognition on the self-built database reaches 55.45%.3.Bimodal emotion recognition.In order to improve the accuracy of emotion recognition,this paper adopts the bimodal emotion recognition based on the combination of audio-visual features.The above complementary features are used as visual features,and the most discriminative 25 prosodic features and MFCCs are used as audio features.Due to the difference between visual features and audio features,this paper uses the multi-kernel support vector machine classifier(MKL-SVM)to solve this problem.The final experiments were carried out on the standard databases RML and SAVEE,respectively.The experimental results show that the emotion recognition effect of the bimodal multi-feature fusion is significantly better than that of the single mode,and the visual and audio emotion recognition rates reach 78.82% and 87.64%,respectively,the accuracy of emotion recognition in video is further improved.
Keywords/Search Tags:audio-visual emotion recognition, multi-feature fusion, STMWLD, KECA+DMCCA, MKL-SVM
PDF Full Text Request
Related items