Facial expression recognition as a critical technology in the field of computer vision has a wide range of applications in people's daily life.In recent years,Facial Action Coding System(FACS)-based facial expression recognition has become one of the mainstreams in facial analysis research.However,the application of such methods is limited by the manually labeling process of Action Units(AUs).Therefore,the goal of this paper is to achieve accurate and robust facial AU automatic detection by utilizing the rich context relationship and the inherent AU properties both in spatial and temporal dimensions,where facial AU as the research subject and spatial-temporal co-occurrence as the constraint condition,so as to further promote the development of facial AU and expression analysis via alleviating the above challenges.The main works of this paper are summarized as follows.(1)A Lite AU Convolution Network(LAUCN)with sparsity and co-occurrence-dual constraints is proposed,where the output label space is jointly constrained by the sparsity and co-occurrence relationship of facial AU.Instead of traditional handcrafted feature extraction process,LAUCN automatically generates more representative features in a data-driven manner,while only a small number of samples are needed.The experimental results on CK+dataset,namely average Fl-score of 69.3%and accuracy of 91.4%,indicate that the introduction of spatial label sparsity and co-occurrence contributes to the performance improvement.(2)A Weakly-Supervised Dual-Attention Fusion Network(WS-DAFNet)is proposed,which exploits two attention mechanisms "to selectively extract deep features from multiple dimensions and capture AU correlation.In order to ensure the reasonableness of the introduced attention,a weakly supervised learning module is also designed for adaptively refining attention maps.Moreover,the efficient aggregation of global features that are capable of capturing facial holistic spatial properties and local ones that captures subtle local appearance changes,is also a key point for WS-DAFNet to accurately detect AU.The experimental results on BP4D dataset show that the recognition accuracy and F1-score of proposed WS-DAFNet are increased by 3.7%and 4.5%respectively,compared with existing popular AU detection algorithms like EAC-Net.(3)Different from the above two methods,a Spatial-Temporal Collaborative Deep Network(StAUNet)is proposed for image sequences-based AU detection.StAUNet jointly considers spatial representation,temporal modeling and AU co-occurrence,and achieves the mainstream level of AU detection through the collaborative decision of the three.The experimental results on BP4D dataset show that F1-score of StAUNet substantially increased by 7.9%compared with the state-of-the-art like EAC-LSTM,which demonstrates that the importance of spatial representation,temporal modeling and AU co-occurrence for AU detection. |