| Emotion plays an indispensable role in daily life.People often express their psychological state to each other through various emotions.In fact,human emotional state is continuously changing and can be expressed through a variety of modal information.Dimensional emotion describes such a complex,delicate and continuous emotional state by establishing a continuous emotional space.Consequently,more and more researchers pay attention to the methods of continuous emotion recognition based on different dimensions.Existing video modal dimension emotion recognition methods do not calculate the influence of features on emotion recognition more carefully,and ignore the influence of the former emotion on the latter.In the process of multi-modal dimension emotion recognition,how to effectively learn multi-modal features and reasonably calculate the fusion ratio has also become a challenge.Therefore,this paper mainly studies the effect of context information on dimension emotion recognition through two aspects: video-based attention model and multi-modal attention model.Specific research contents are as follows:(1)A continuous dimensional emotion recognition method based on emotion-embedded visual attention model is proposed.A two-stage model is proposed to overcome the neglect of different regions of face in traditional emotion recognition.Firstly,the deep convolution neural network is used to learn face features,which retains the texture information and hierarchical information of the image while avoiding the instability of shallow features and the tedious manual features.Then,a visual attention model based on long short-term memory network is employed to compute the contributions of different parts of face to emotion recognition by using context information,and to strengthen the emotional salient areas.Finally,the emotional state of the previous moment is mapped to specific emotional categories by K-means clustering method,and fused with the facial features of the current moment to further learn the emotional salient features and enhance the emotional continuity between contexts.On the two public databases of AVEC2016 and AVEC2017,which are the International Audio/Visual Emotion Challenge and Workshop,the CCCcorrelation coefficients of the best model on Arousal dimension are increased by 1.5%and 2% respectively,and the training speed of the model is greatly improved without relying on manual features.(2)A multi-modal dimensional emotion recognition method based on hierarchical attention mechanism is proposed.To overcome the shortcomings of existing methods in audio feature learning and multi-modal fusion ratio calculation,a hierarchical mechanism including frequency attention model and multi-modal attention model is proposed.Firstly,the frequency attention mechanism is added to the audio modality.By calculating the contribution value of different frequency domains to emotional expression,the focus is on the frequency which is highly correlated with the recognition effect,so as to reduce the interference of irrelevant frequency.Then the voice gating switch is added to LSTM to extract the effective part of audio information for emotion recognition.Multi-modal attention mechanism is used to calculate the contribution of two modalities to emotion recognition and fuse them to make up for the defect of incomplete expression of single modal information.Finally,the improved loss function is used to optimize the modal missing of the face without voice or voice without face,so as to improve the robustness of the model.On AVEC2016 and AVEC2017 databases,the CCC correlation coefficients on Arousal dimension and Valence dimension were increased by 3.9% and 2% respectively,and8.5% and 1% respectively.(3)A prototype system of dimensional emotion recognition based on attention model is designed and implemented by using Python 3 language and Tensorflow framework.The prototype system includes three modules: data processing,video emotional salient feature learning and multi-modal emotional salient feature learning.Through the implementation of the prototype system,the effectiveness and practicability of the proposed method are demonstrated and verified. |