| With the wide application of facial authentication and facial editing technology in mobile devices,the facial landmark detection model faces three challenges from pose&expression,occlusion,and real-time requirements.At present,the research on facial landmark detection models mainly focuses on improving the model’s accuracy,while ignoring the challenges of model size and real-time reasoning.Focusing on designing an efficient facial landmark detection model,this paper studies how the facial landmark detection model meets the challenges of mobile use.The main research progress is as follows.Aiming at the challenge of inaccurate positioning caused by posture and expression,this paper uses the Shufflenetv2 module to build the network framework and further uses the methods of dynamic convolution,Wing loss function,and data enhancement to improve the ability of the model to deal with exaggerated posture and expression.Dynamic convolution calculates attention dynamically for different inputs and has better feature expression ability than static convolution.The Wing loss function enhances the influence of small errors and enables the model to further train on small error landmarks.On the WFLW test set,the model’s average eye distance normalization error is 5.81,the scale is 984.1KB,and the reasoning speed is 50.31 fps.The model realizes the balance of accuracy,model size,and speed.Aiming at the challenge of inaccurate positioning caused by occlusion,this paper uses the method of adding an occlusion prediction network and data enhancement to improve the positioning accuracy of the model on the visible points.Occlusion prediction network and facial landmark coordinate regression network share shallow parameters.By multi-task learning,the occlusion prediction network can induce the facial landmark coordinate regression network to concentrate attention on the visible landmarks.By adding random color block occlusion to the dataset for data enhancement,the model can learn more occlusion during training.On the MERL-RAV dataset,the model with occlusion prediction network improves the accuracy of visible landmarks positioning by 1.67% and reduces the normalization error of eye distance by 0.48%,which has better occlusion processing ability than the model without occlusion prediction network.Because of the challenges brought by limited system resources,this paper further uses network slimming pruning and knowledge distillation to compress the model.Network slimming pruning induces the model to produce sparse structure during training.By removing redundant parameters,the model will be smaller.Knowledge distillation enables the small model obtained by pruning to further restore accuracy after fine-tuning.By taking the model before pruning as the teacher model and the model after pruning as the student model,the small model can learn the prediction ability of the teacher model.On the WFLW test set,the compressed model can reduce the storage overhead by 27.33%,while the accuracy is only reduced by 2.04%.The model’s accuracy,speed,and storage overhead are better than the PFLD model. |