Font Size: a A A

Research On Dynamic Emotion Recognition Based On Spatial-Temporal Neural Networks

Posted on:2019-05-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:1368330590960107Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Emotion recognition has wide applications in human-machine interaction and therefore draws more and more attentions.For recognizing human emotion,researchers utilize different electric devices to collect signals reflecting human emotions,such as facial expression sequences,electroencephalogram(EEG)and acoustic waves,where EEG and facial expression sequences are two widely employed signal modalities.EEG and facial expression sequences are both time-varying affective phenomenon and have similar spatial-temporal structure: they contain not only spatial components at a single moment but also contextual dependencies among temporal slices.For better recognizing human emotion,the crucial spatial and temporal dependencies in EEG and facial expression sequences should be well modeled.Motivated by this,we investigate dynamic emotion recognition based on EEG and facial expression signals by designing neural networks with spatial-temporal architectures.Moreover,as spatial context is more complex than temporal,we first investigate how to well capture spatial dependencies as the basis of designing spatial-temoral neural networks.In detail,the major innovative achievements of this dissertation include the following aspects:(1)We propose a bilinear convolutional network(BCN)inspired by affective cognitive mechanism of neural system,and apply it for feature learning on facial images and achieving facial emotion recognition(FER).To imitate the early perceptual processing of affective cognitive,for a given facial image,we first construct a feature matrix based on extracted scale invariant feature transform(SIFT)features from regions around facial landmark points.Then,to further imitate the deep perceptual processing,the feature matrix is sent to a well-designed BCN model as input data for learning optimal discriminative features for FER.In detail,BCN consists of bilinear projection layer,1D convolution layer,non-linear activation layer,etc.In the process of emotion recognition,BCN is able to characterize the relationship between SIFT feature matrices and their corresponding high level semantic information.By training this model,we are able to learn a set of optimal features that are well suitable for classifying the facial expressions across different facial views.(2)We propose a novel spatial-temporal recurrent neural network(STRNN)to integrate the feature learning from spatial and temporal information of both EEG signal and facial expression sequences into a unified spatial-temporal dependency model.In STRNN,a multidirectional spatial recurrent neural network(SRNN)layer is first employed to capture those spatially co-occurrent variations of human emotions,especially the long-range contextual cues,by traversing the spatial regions of each temporal slice along different directions.Then a bi-directional temporal recurrent neural network(TRNN)layer is further employed to learn discriminative features which characterize the temporal dependencies of the sequences produced from the SRNN layer.For further selecting salient regions with more discriminative ability for emotion recognition,we conduct sparse projection onto those hidden states of SRNN and TRNN layers.Consequently,the proposed two-layer RNN model provides an effective way to make use of both spatial and temporal dependencies of the input signals for emotion recognition.(3)We propose a deep neural network for EEG-based emotion recognition by employing both high-order and first-order statistics features,where the high-order statistics feature is characterized by symmetric positive definite(SPD)matrices with a spatial-temporal structure.Noting that SPD matrices are theoretically embedded on Riemannian manifolds,we propose an end-to-end deep manifold-to-manifold transforming network(DMT-Net),which can make SPD matrices flow from one Riemannian manifold to another more discriminative one for facilitating the EEG-based emotion recognition.To learn discriminative SPD features from both spatial and temporal dependencies,we propose three novel layers on manifolds: i.e.,(a)the local SPD convolutional layer,(b)the nonlinear SPD activation layer,and(c)the Riemannian-preserved recursive layer.The SPD property is preserved through all layers without the singular value decomposition(SVD)operation,which has to be conducted in the existing methods with expensive computation cost.Furthermore,a diagonalizing SPD layer is designed to efficiently calculate the final metric for the classification task.Finally,DMT-Net is further fused with a first order layer capturing temporal evolution information for improving emotion recognition performance.(4)We propose a novel tensor graph convolutional neural network(TGCNN)for EEG-based emotion recognition.In this process,EEG sequences are first modeled as dynamic graphs by considering electrodes as nodes,where each slice of the dynamic graph is treated as a subgraph.To globally capture dependencies among subgraphs as well as nodes within each subgraph,a graph preserving layer is proposed to recurrently memorize salient nodes of subgraphs through two critical operations,i.e.cross graph convolution and graph pooling.Specifically,for cross graph convolution,a parameterized Kronecker sum operation is proposed to generate a conjunctive adjacency matrix characterizing the relationship between every pair of nodes across two subgraphs.Taking this operation,general graph convolution may be efficiently performed,which reduces memory and computational cost.By encapsuling dynamic graphs into a recursive learning process,EEG based emotion recognition can be efficiently achieved by well modeling temporal evolution as well as the spatial layout of graphs.
Keywords/Search Tags:emotion recognition, facial expression sequence, EEG signal, spatialtemporal neural networks, bilinear convolutional networks, recurrent neural networks, manifold-to-manifold transforming, graph convolutional neural networks
PDF Full Text Request
Related items