| At present,artificial intelligence has shown a trend of comprehensive expansion and rapid breakthrough.The intelligent services have been widely used to change people’s production patterns and life style.In the era of human-computer symbiosis,artificial intelligence is understanding human emotions.As a result,affective computing has ushered in a golden age.Multiple factors have jointly promoted the technological evolution and industry development of intelligent affective analysis,making artificial intelligence technology more attractive.Among various emotional expressions(e.g.,facial expressions,body movements,speech and text)used in human communication,the emotional information conveyed by facial expressions accounts for up to 55% of the total information.It can be seen that facial expressions are of great significance for analyzing the emotional state of the target.In recent years,benefited from the rapid development of deep learning and identification technology,facial expression recognition(FER)methods have also gained more and more attention.Although existing methods have surpassed the human recognition performance on standard datasets,the current technologies still do not meet the practical requirements.To this end,this dissertation proposes a series of facial expression-based visual emotion analysis methods using deep learning as the theoretical framework.It includes extracting highly discriminative features,large-scale label dependence,and the general encoder problems.In summary,the main contributions of this dissertation are as follows:1.An adaptive facial expression representation learning method based on coarse-fine labels and distillation is proposed.Current unconstrained facial expression datasets are classimbalanced,and the similarity between different facial expression categories is high.Existing methods design deeper or wider models to improve facial expression recognition performance,but leads to increasing model storage and computing costs.To address the classimbalanced problem,this dissertation proposes an adaptive regular loss to re-weight the category importance coefficient,which can improve the discriminative power of facial expression representations.Inspired by the human cognitive model,this dissertation designs a coarse-fine labeling strategy to guide a model from easy to difficult to classify highly similar facial expressions.Next,this dissertation further proposes a novel training framework,i.e.,the emotional education mechanism.Specifically,this framework consists of a knowledgeable teacher network and a self-taught student network.Among them,the former fuses the output of coarse stream and fine stream to learn facial expression representations from easy to difficult.Under the supervision of the pre-trained teacher network and the learning experience,the latter can maximize potential performance and compress the teacher network.Extensive experiments show that the proposed method achieves superior performance than state-of-the-art methods.2.An unconstrained facial expression recognition method based on the no-reference deelements learning is proposed.Most unconstrained facial expression recognition methods take original facial images as inputs to learn discriminative features by well-designed loss functions,which cannot reflect important visual information in faces.Although existing methods have explored the visual information of constrained facial expressions,there is no explicit modeling of what visual information is important for unconstrained FER.To find out valuable information of unconstrained facial expressions,this dissertation poses a new problem of no-reference de-elements learning: decomposing any unconstrained facial image into the facial expression element and a neutral face without the reference of corresponding neutral faces.Importantly,the element provides visualization results to understand important facial expression information and improves the discriminative power of features.Moreover,this dissertation proposes a simple yet effective De-Elements Network to learn the element and introduce appropriate constraints to overcome no ground truth of corresponding neutral faces during the de-elements learning.The comparisons on in-the-wild facial expression datasets show that the proposed method is promising to improve classification performance and achieves equivalent performance compared with state-of-the-art methods.Also,the proposed method achieves the strong generalization performance on realistic occlusion and pose variation datasets and the cross-dataset evaluation.3.A semi-supervised facial expression recognition method based on an adaptive confidence margin is proposed.Semi-supervised facial expression recognition has received significant attention over the last years for its promising performance on leveraging a few labeled data together with large-scale unlabeled data for the model training.Recently,most semi-supervised learning methods usually train a classification model via selecting parts of unlabeled data,whose prediction confidence scores are higher than a threshold(i.e.,the confidence margin).This dissertation argues that large-scale unlabeled data should be fully leveraged to further improve the performance.To this end,this dissertation proposes a semisupervised facial expression recognition method based on an adaptive confidence margin.Specifically,the proposed method firstly leverages labeled data to learn a confidence margin.It then partitions the unlabeled data into two subsets by comparing their confidence scores with the margin:(1)subset I including samples whose confidence scores are no less than the margin?(2)subset II including samples whose confidence scores are less than the margin.For the samples in subset I,their predictions are constrained to match the generated pseudo-labels.Meanwhile,for the samples in subset II,a feature-level contrastive objective is used to learn effective facial expression features.Extensive experiments on the challenging image-based and video-based facial expression datasets show that the proposed method achieves state-of-the-art performance,especially surpassing fully-supervised baselines in a semi-supervised manner.Additionally,the proposed method is promising to leverage the cross-domain unlabeled data for effective training to boost the fully-supervised performance.4.A general encoder pre-training method for facial expression analysis is proposed.Existing methods leverage different large-scale training data to train encoders for specific facial expression recognition applications.This dissertation proposes a new task,i.e.,pre-training a general encoder to extract any facial expression representations without fine-tuning.To tackle this task,this dissertation extends the self-supervised contrastive learning to pre-train a general encoder for facial expression analysis.Specifically,given coarse-grained labels and a specific data augmentation strategy,some different-view positive and negative pairs are firstly constructed.Then,a coarse-contrastive learning is proposed,making the features of positive pairs pulled together and pushed away from the features of negative pairs.However,there is a key problem that the excessive constraint on the distribution of coarse-grained features will affect fine-grained facial expression recognition.To address the issue,this dissertation designs a weight vector to constrain the optimization of the coarse-contrastive learning.Therefore,a well-trained general encoder with frozen weights can adapt to different categories and achieve the linear evaluation on any target facial expression datasets.Extensive experiments demonstrate that the proposed method achieves superior or comparable performance than state-of-the-art methods,especially on unseen datasets and cross-dataset evaluation.Moreover,the proposed method can reduce the training burden and provide a solution for fully-supervised facial expression feature learning with fine-grained labels. |