| Facial expression recognition is one of the research core in the field of human-computer interaction,and has been widely used in the fields of distance education,intelligent driving and so on.Although facial expression recognition has made many achievements,there are still many challenges in practical application scenarios,such as complex and changeable lighting conditions,facial expressions with different degrees of occlusion,etc.In order to provide a robust facial expression recognition method,many scholars have applied neural network to facial expression recognition and achieved good recognition results.In this paper,we study expression recognition of still images and image sequences based on neural network.(1)For the study of facial expression recognition in static images,DenseNet121 model was selected for facial expression recognition research,and a DenseNet model with multi-scale attention mechanism was proposed.Firstly,because the network layer of DenseNet121 model is too deep and the number of network parameters is too many,four dense blocks in DenseNet121 model are changed to three dense blocks and the number of convolution blocks in dense blocks is simplified to a certain extent.Secondly,for images with input size of 48×48×1,different from DenseNet121 model which adopts 7×7 convolution kernel to extract image expression features,this paper introduces a multi-scale convolution kernel to extract multi-scale features of input images,which is more conducive to subsequent expression classification than extracting single-scale features.In addition,as the network layers of DenseNet121 model deepen,the extracted feature vectors continue to expand in channel dimension and are continuously compressed in spatial dimension.It is more valuable to obtain the contribution rate of feature vectors of different channels to subsequent facial expression classification than to distinguish the contribution rate of feature vectors of different spatial dimensions to facial expression classification.Therefore,the channel attention module MECANet is inserted into densenet121 model.Experiments on CK+ and Fer2013 datasets show that compared with DenseNet121 model,the improved DenseNet model has significantly improved the accuracy of expression recognition in static images.(2)After fully considering the importance of global and local time domain information of image sequence,a multi-network fusion model is proposed for facial expression recognition of image sequence.The network model consists of three modules: MECA-Inception-Res Net +Bi LSTM model to extract global time domain information of image sequence,3DMECA-Convnet model to extract local time domain information of image sequence and soft vote fusion strategy module.Experiments on CK+ and Oulu-CASIA datasets show that the recognition accuracy of facial expression is better than that of single network model by extracting global and local time domain information of image sequence and combining with the new decision strategy. |