Face recognition is a pivotal area of research within the AI visual domain,and with the recent surge in AI technology development,face attack risks such as AI-generated fake photos have garnered significant attention from researchers.In response,face antispoofing detection technology has emerged as a critical component in the security infrastructure,serving as a kind of "guardian" against potential security breaches.Face anti-spoofing detection technology is already widely employed in practical settings such as military security,real-time payments,and access control verification,providing crucial technical support for facial recognition models in terms of security and protection.The technology evaluates whether the facial image captured by the camera sensor is live or not,offering decision-making solutions for facial recognition models.In real-world environments,there are various attack methods such as 2D photos,3D masks,and video playback,which force facial recognition models to make incorrect decisions,resulting in endless hidden dangers and even unpredictable losses.Consequently,how to prevent the occurrence of various attack events in complex real-world environments,and design and construct effective detection models to further improve the accuracy of detection models and reduce the error rate of detection models,is a highly challenging and significant work.In recent years,a group of milestone research achievements have emerged in the field of face anti-spoofing detection,and breakthroughs have been made in performance and efficiency.The mainstream algorithms for face anti-spoofing detection can be roughly divided into machine learning detection algorithms,hybrid model detection algorithms,and deep learning detection algorithms.This thesis analyzes existing algorithms and enhances the current mainstream multi-modal detection algorithm from multiple perspectives such as convolution neural network model architecture,multifeature fusion strategy,attention mechanism network,and group convolution,and also adopts mature data augmentation techniques.The model achieves excellent results on two multi-modal datasets.The specific work of this thesis is as follows:1.From the perspective of the characteristics of multi-modal datasets and the impact of average pooling on model performance,this thesis proposes a multi-modal face antispoofing detection algorithm based on channel cross fusion network and global depthwise convolution.The channel cross fusion network is used to fuse the modal features of two other layers on the channel dimension of each convolutional network layer to obtain the fusion feature of the layer for subsequent learning.The global depth-wise convolution module replaces the global average pooling to solve the performance problems caused by the average operation of the global average pooling layer(center features are superior to edge features).Finally,the effectiveness of the channel cross fusion network and global depth-wise convolution module is verified through experiments,and the model performance is compared on the CASIA-SURF test set and the CASIA-SURF Ce FA validation set.2.The previous algorithm only crossed-fused on the channel dimension,and this thesis further explores the network model on the channel and spatial dimensions,proposing a multi-modal face anti-spoofing detection algorithm based on channel spatial cross fusion network and attention mechanism.This algorithm uses channel spatial cross fusion method to fuse the modal features of two other layers on both the channel and spatial dimensions in each convolutional network layer to obtain the fusion feature of the layer for subsequent learning,and introduces a channel spatial attention mechanism.The effectiveness of the multi-modal fusion network(The combination of the channel spatial cross-fusion module and the channel spatial attention mechanism)is verified through experiments,and better results are achieved on the same two datasets,further improving the performance of the network.3.Although the various modules introduced by the above algorithms can improve the performance of the model,they also increase the number of parameters and complexity.Therefore,this thesis introduces the idea of grouped convolution to further reduce the number of model parameters and complexity.From a theoretical perspective,it mainly manifests in the following aspects:(1)Based on the idea of pairwise cross-fusion,the channel cross-fusion network and channel spatial cross-fusion network ideas are proposed to cross-fuse corresponding features at each layer,obtaining the fused feature information for subsequent steps.(2)To address the performance issue caused by the global average pooling layer operation on facial image features,the global depth-wise convolution module is introduced to replace the average pooling operation.(3)Based on the idea of attention network,two types of attention mechanism networks are sequentially introduced in the algorithm,and data augmentation,image patch feature learning and other methods are used to stabilize model performance.(4)Due to the issue of model parameterization and complexity,the algorithm adopts the grouped convolution structure to reduce the parameter amount and complexity,and introduces the modality feature random erasure strategy to prevent overfitting during the training process.The work in this thesis has good application innovation value and has achieved the expected goals.Exploring multi-modal data generation and lighter model architecture is the next step to consider,in order to achieve a more friendly anti-spoofing detection model. |