| Facial expressions are the most visual way to express people’s moods and can be subdivided into macro-expressions and micro-expressions.Macro-expressions are easy to control and can be disguised,thus are of little research value.Unlike macro-expressions,which can be significantly observed,micro-expressions are facial movements that people involuntarily present when trying to hide their intense feelings,and have great potential for applications in psychotherapy,criminal interrogation,and marketing.Therefore,some scholars have combined micro-expressions with artificial intelligence methods and proposed automatic micro-expression spotting and recognition techniques to accurately analyze facial micro-expressions.Micro-expression analysis research can be generally decomposed into two main tasks from the implementation process:micro-expression spotting and micro-expression recognition.With the rapid development of deep learning technology,micro-expression spotting and recognition based on deep learning have gradually become a hot spot in micro-expression analysis research.At present,the challenges of micro-expression spotting based on deep learning are:micro-expression facial action amplitude is extremely small and duration is extremely short,which leads to insufficient representation of weak deformation by deep networks and reduces the spotting accuracy of micro-expression fragments.Compared with micro-expression spotting,the difficulty of micro-expression recognition lies in the existence of certain biases in the identity information of each sample entity,which weakens the robust representation of micro-expression features.In addition,the existing micro-expression dataset has few samples and is not rich in diversity,which leads to increased difficulty in network learning and the network generalization performance needs to be improved.Based on the aforementioned problems in micro-expression analysis,this study conducts research on micro-expression spotting and recognition based on Dual-modalities information fusion,aiming to improve the robustness of deep networks for micro-expression feature representation.The main contributions of our work are as follows:(1)An Adaptive Enhanced Micro-expression Spotting Network(AEM-Net)is proposed based on micro-expression spotting.First,RGB images are combined with optical flow images,and an inflated dual-stream 3D network is used as a feature encoder to extract the sequence features of RGB and optical flow respectively,and then fuse them effectively.In order to make fused features contain richer detail information,AEM-Net constructs a multi-stage and multi-scale feature extraction network with combined RGB and optical flow features as input to suppress the losses of detail information while obtaining overall representational information.To strengthen the interest in micro-expression features,a channel attention mechanism is introduced to achieve the adaptive enhancement of minute features.In addition,a post-processing module is constructed in AEM-Net to first filter out samples with frame numbers within the threshold range,and then suppress outlier negative samples by calculating the intersection of union between the current sample and the set of samples with the highest confidence.The experimental results on CAS(ME)~2 and SAMM-LV long video datasets validate the viability of the method.(2)A Micro-expression Deep Mutual Learning Network(MDML)based on dual-modality mutual learning is proposed.The method uses a dual-stream network as the overall network architecture and uses the start-frame-to-apex differential frames and optical flow as the input of the network to explore multiple ways to remove the identity information.MDML sets up an interactive learning mechanism to obtain a more discriminative micro-expression representation by supervised learning and interactive learning of different modal information at different stages of the network.The generalization ability of the network is improved.To further optimize the recognition performance of the network,MDML also constructs a network for learning feature distributions of multiple source domains to guide the learning of feature distributions of micro-expression data.Experimental result on three datasets,CASMEII,SAMM and SMIC,demonstrate the effectiveness of the proposed method. |