Font Size: a A A

Research On Domain Transfer Learning And Multi-target Recognition Methods For Facial Expression

Posted on:2024-05-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:M BieFull Text:PDF
GTID:1528307121471714Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Facial expression recognition has significant and extensive applications in the field of affective computing.Consequently,it has become one of the prime research topics in computer vision.However,there are many challenging problems that need to be solved in practice,such as the facial region typically occupying a relatively small area in the entire image,leading to difficulties in the extraction of valuable information.In addition,the lack of high-quality public datasets and the inherent issues of small sample size and data imbalance in self-collected datasets pose additional challenges.Based on the analysis of existing research achievements in the field of facial expression recognition,this paper puts forward targeted improvement strategies.Firstly,aiming at the problem of small sample size in facial expression recognition,a domain adaptive network is constructed.By utilizing samples with their true labels from the source domain for training,and transferring the learned knowledge to the target domain,the model is enabled to recognize expressions in the target domain effectively.Considering the differences of data collected from different scenes,lighting conditions and devices,as well as the distribution differences of key regions of different expressions,a multi-level and multi-dimensional information fusion method is proposed to extract crucial features.In the context of classroom teaching scenarios,there are many challenges when detecting multiple targets and small faces.Therefore,in this paper,the FE-YOLOv5 model is established by using the feature enhancement strategy,and realizes more accurate multi-target detection and location.The specific works are outlined as follows.Aiming at the problem of small sample size in the task of expression recognition task,this paper proposes the DA-FER model(Domain Adaptive for Facial Expression Recognition),which adopts the transfer learning strategy.We use public datasets with sufficient samples as the upstream task to train the network and employ a domain adaptive method to train the target domain dataset with fewer samples.This approach makes the network model more suitable for the downstream task.In this paper,the proposed domain-adaptive method integrates the SSPP module and Slice module to fuse expression features of different dimensions,and retains the regions of interest of five senses.This achieves more discriminative feature extraction and enhances the network’s domain adaptive capability.Furthermore,by employing strategies such as mean operation and adaptive average pooling,the parameter is reduced by about half,which ultimately enables the DA-FER model to achieve favorable results in terms of both network complexity and accuracy in the target domain.When the self-collected datasets are used as the target domain and RAF-DB and Fer2013 as the source domain,the performance of expression recognition is improved,demonstrating the effectiveness of our domain adaptive method.The extraction of expression features is easily affected by the influence of pose,lighting changes,and occlusion,and the location distribution of important information is also diversified,making it difficult to get discriminative expression features.To address this,a facial expression recognition method based on multi-level and multi-dimensional information fusion(MMIF)is proposed.First,the low-level,mid-level,and high-level features extracted by the network are visualized and analysed.In view of the deeper level features contain vital semantic information,the fusion strategy is applied from mid-level to high-level features.This method is applied to a CNN network(MMIF-CNN),where each up-sampled convolutional group was smoothed to aggregate multi-level features by residual learning.At the same time,the global attention mechanism(GAM)was introduced to realize cross-dimensional interactions of channels,spatial width,and spatial height.This allows the model to focus on features of different levels and salient regions of the image.This method is also applied to a transformer network(MMIF-Trans),in which the strategy of data dimension transformation is adopted to enhance the spatial dimensions perception of the network.To improve the generalization ability of the model,the Split module is introduced,and strategies such as group convolution and mean module are used to enhance the performance while controlling the number of parameters at the same time.Both models achieve better experimental results on in-the-wild dataset(Fer2013)and in-the-lab dataset(CK+).This paper starts from the general problem of facial expression recognition mentioned above and gradually delves into the multi-faces expression recognition in specific scenarios.Chapter 5 carries out a specific application research of facial expression recognition in the classroom teaching environment.Facial expression recognition is gradually being integrated into the classroom environment for educational assessment.Most existing algorithms are based on single frontal faces and are less effective for processing multi-facial images in a real classroom environment.In response to the challenges posed by the small facial size and low video resolution in classroom teaching scenarios,resulting in inaccurate face detection and difficulties in extracting facial expression features,this paper proposes a method called Feature Enhancement for Multi-Faces Expression Recognition(FEMFER)specifically designed for facial expression recognition in classroom environments.First,the feature enhancement method with upsampling(UPS)module and the Convolution-Batch normalization-Leaky Re LU(CBL)module are applied to improve YOLOv5.The UPS module reduced the network’s local perceptual field and effectively learned detailed information from the backbone.The CBL module speeded up the model convergence while increasing the nonlinearity of the features.The network could extract and fuse features efficiently,which was more suitable for small face detection in the classroom situation and solved the problem of inaccurate recognition of multi-target small face recognition in the original network.The problem of data imbalance in real scenes is solved to some extent by fusing two methods of data augmentation and focal loss function.Due to the lack of widely-used datasets specifically from classroom teaching scences,this paper has compiled three datasets collected from teaching environments: a multi-face image dataset for object detection training,a single-face image dataset for expression classification training,and a video dataset for application test.Compared to the original YOLOv5,the proposed method in this paper achieves more accurate face localization,with an improvement of 7.18% in mean average precision(m AP).The detection speed is faster,reaching 52.13 frames per second,demonstrating the potential application of this method in real-time facial expression recognition.
Keywords/Search Tags:Facial Expression, Domain Transfer Learning, Feature Fusion, Feature Enhancement, Multi-target Recognition
PDF Full Text Request
Related items