Font Size: a A A

Research On Lightweight Facial Landmark Detection

Posted on:2021-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2428330629980108Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Facial landmark detection is known as a computer vision task that locates the positions of landmarks on the human face.It is an important foundation for the fields of face recognition,expression recognition,and facial animation.Therefore,it has attracted the attention of many researchers recently.However,existing facial landmark detection algorithms usually only consider how to improve the generalization performance of the model,but do not consider the issue of efficiency.Models with good generalization performance often have huge amount of parameters,which lead to the poor practicality and poor scalability of these algorithms in practical applications.This paper uses the visual attention mechanism and the strategy of knowledge distillation to focus on the efficiency of the landmark positioning model that has yet to be solved.Firstly,based on the classic hourglass network model(SHN),we proposed an Attention-Guided Coarse-to-Fine Network(AGCFN)without increasing the parameters of the hourglass network model.The effect of using traditional regression to solve key point detection is not ideal.The state-of-the-art methods commonly used at present is to treat it as a detection problem to obtain a heatmap.This significantly improves the accuracy of landmark detection,but the independent heatmap ignores the correlation information between channel features,resulting in the lack of position constraint information between the final positioned points.In this work,we use the attention mechanism to learn the correlation information between channel features,and propose a network guided by the attention mechanism in a coarse-to-fine way to improve the accuracy of facial landmark detection without increasing the parameter amount of the model.Specifically,two attention modules are introduced,the channel attention module and the spatial attention module.The channel attention module is used to simulate the correlation between channels.The spatial attention module uses the conditional random field(CRF)to learn the spatial position information of the prediction map.Experiments show that the AGCFN has achieved higher accuracy than the SHN in the 300-w data set,300-w special test set and wflw data set.When introducing the structural information between the facial landmarks,the generalization performance of the hourglass network has improved significantly.However,when the hourglass network is applied to the detection of face key points,there are still problems such as complex structure and time-consuming.In order to further reduce the complexity and parameter quantity of the network,we introduce a strategy of knowledge distillation to train a lightweight network with smaller parameters and smaller structure while maintaining the generalization ability of the network.The principle of knowledge distillation is to train a lightweight student network which supervised by a network of teacher with large complexity and robust parameters.We regard the AGCFN as the teacher network,the student network structure is the same as the teacher network,but the network depth is reduced to only half of the teacher network.Through the strategy of knowledge distillation,the structure knowledge of the robust teacher network is transferred to the lightweight student network,so that the student network can learn quickly at a lower computing cost.In addition,in order to further reduce the amount of parameters,the depth-wise separable convolution instead of the standard convolutions is used in the hourglass module of student network.Through the combined effect of depth-wise convolutions and point-wise convolutions,the amount of calculation is reduced.Compared with the latest facial landmark detection methods,experiments have proved that our lightweight student network has achieved outstanding performance on the widely used 300-W dataset even when the amount of parameters has dropped significantly.In summary,this paper first trains a teacher network model by combining hourglass networks and attention mechanisms.This model has high positioning accuracy when dealing with faces in various scenarios.Then,a lightweight student network model is trained through the strategy of knowledge distillation.Experiments show that,through the distillation strategy,lightweight networks with fewer parameters have also achieved competitive performance on commonly used data sets.
Keywords/Search Tags:Facial landmark detection, Attention mechanism, Lightweight, Knowledge distillation
PDF Full Text Request
Related items