Font Size: a A A

Research On Multi-frame Based Deep Face Recognition In Videos

Posted on:2021-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2518306104488494Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of intelligence and automation,face recognition has become one of the first launched artificial intelligence algorithm through its non-contact,high recognition rate,security and other advantages.The mainstream of face recognition in the industry are based on static images.Videos contain more abundant semantic information than images.It is not only more efficient but also more robust,when taking video sequences directly.The paper proposed a complete pipeline of video face recognition.For face detection module,inspired by the object detection,using the Receptive Field Block module and depth separable convolution to design a lightweight single-stage multi-scale multi-task face detector.The face detector can do face bounding box localization and facial landmark localization at the same time,and it performs well in terms of speed and accuracy.For face tracking module,the real-time multi-object tracking algorithm--SORT algorithm is adopted with face detector to achieve real-time face tracking in different video scenarios with various resolutions.With the purpose of improving the image quality of the dataset for training better face recognition model,a performance-based face quality evaluator is designed,using a third-party face feature extractor to select images that perform well in face recognition task.Then training a simple but efficient convolutional neural network to perform regression prediction.The distribution of the prediction result is roughly close to the human visual perception.The paper proposed an automatic labelling algorithm for video face dataset.Face tracking produces a lot of unlabeled video face sequences.The unlabeled data need to be sent into face quality assessment module and face clustering module in order to merge the same class data,remove outliers and low-quality frames as well.In the video face feature extraction stage,and efficient spatio-temporal non-local attention mechanism and feature pooling layer are applied for video-level feature extraction.And the classification crossentropy loss function is combined with the optimized metric learning loss function as the supervision signals to ensure the extracted feature is compact and discriminative.The proposed method achieves the accuracy of 96.31% on the YouTube Faces video face recognition benchmark.
Keywords/Search Tags:face detection, real-time face tracking, face quality assessment, automatic labelling, video face recognition
PDF Full Text Request
Related items