| Expression is an important carrier of information transmission in interpersonal communication,and it is also the most direct and effective way for machines to understand human emotions.Therefore,facial expression recognition has wide application prospects in many occasions involving human-computer interaction,such as traffic safety,smart medical care,product marketing and so on.Traditional facial expression recognition algorithms based on manual feature extraction have poor robustness and are often only suitable for some specific application scenarios.Recently,with the rapid development of deep learning,some methods based on convolutional neural networks have greatly improved the accuracy of facial expression recognition.However,as the number of convolutional layers increases,the amount of parameters and computation of the network also increases sharply,which increases resource demands and computational costs on the system.In addition,the expression of human facial expressions is easily affected by environmental factors and individual differences,and these irrelevant factors can also interfere with the learning of the network.Therefore,this thesis proposes a complete set of expression recognition methods based on lightweight network.First,a set of face alignment and cropping methods are designed in the preprocessing stage,and a face masking algorithm is proposed to further extract the face region of interest to reduce the interference of background factors.Second,based on lightweight GhostNet model,a network structure of deep and shallow feature fusion is proposed.This structure is designed to fully extract the shallow features at various scales from the original image and cascade these features with deep features to reduce information loss during forward propagation.Then a two-step-based channel attention module is embedded in the network to encode the channel information in the cascaded feature map and obtain the channel attention map.Besides,this thesis proposes a multiscale spatial attention module by combining multiscale feature extraction with spatial attention.Through this module,various positions of the channel-weighted feature map are weighted to obtain the spatial-weighted feature map.Finally,the feature map whose channels and spatial positions are weighted is input into the subsequent network for feature extraction and classification.Experimental results show that this method improves the expression recognition accuracy by 0%-3% and 1%-8% on the extended Cohn-Kanada and Oulu-CASIA NIR&VIS datasets,respectively.And the overall network is lightweight,laying a good foundation for practical applications. |