| Facial expression is a powerful,natural and universal signals for human beings to convey their emotional states and intentions.With the rapid development of computer vision,facial expression recognition(FER)technology has become increasingly prevalent in various fields.It has shown important commercial value in driver fatigue surveillance,augmented reality,human-computer interaction and related fields.However,most current FER algorithms do not make full use of the dynamic information in expressions.And the use of powerful networks with strong feature extraction abilities is impeded by insufficient samples in datasets.On the other hand,FER is used in some important fields,and its datasets often contain private data such as face information,once the network is attacked,the recognition accuracy will drop sharply or the dataset will be leaked,which will bring many harms.Therefore,it is very necessary to study FER network with strong robustness and privacy protection ability.This thesis studies FER algorithm in video and its security.The main work is as follows:Firstly,in view of the common problems of current FER algorithms,this thesis proposes a spatial-temporal dual branch network to extract expression features.Temporal feature extraction network extracts facial landmarks from image sequence to construct expression’s temporal features,spatial feature extraction network extracts expression’s spatial features from static frame and uses densely connected structure to enhance feature reuse.Then,image preprocessing method is optimized according to the features of the expression image to reduce network’s overfitting on small datasets.Then,common model fusion methods are analyzed,and a mix fusion method is designed according to the characteristics of FER tasks and the spatial-temporal network.By introducing sorting scores into the aggregation function,the outputs of each branch are comparable,so as to comprehensively utilize the temporal and spatial features of expressions.Finally,experiments are conducted to verify the effectiveness of each module and the improvement of expression recognition accuracy of the spatial-temporal fusion network.This thesis designs an adversarial attack defense module and an inference attack defense module aiming at two main security risks faced by FER networks.The adversarial attack defense module uses random occlusion and image compression to destroy the structure of adversarial perturbations,and augments the dataset with the idea of adversarial training,so as to reduce the negative impact introduced by defense module on recognition accuracy and enhance the robustness of the network.The inference attack defense module firstly uses conditional deep convolutional generative adversarial network to hide identity information while preserving expression features.On this basis,the network structure and loss function are modified to further reduce the information recovered from reconstruction attack.Finally,experiments are conducted to verify the two modules can enhance network’s security without significantly reducing recognition accuracy. |