Font Size: a A A

Research On Deepfake Detection Technology Based On Transformer

Posted on:2024-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y N ZhangFull Text:PDF
GTID:2568307100964079Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Deepfake refers to generating highly deceptive video content using deep learningbased technologies such as autoencoder or generative adversarial networks.It has the characteristics of high credibility and accessibility.Recently,deepfake is widely spread and becoming more and more realistic so that humans cannot discern the authenticity,which would seriously damage the personal reputation,affect the reputation of the company and threaten national security if used maliciously.At present,research communities in general regard deepfake detection as a binary classification problem and use a convolutional neural network-based model to extract local features to distinguish the authenticity.However,merely discerning fake videos based on local features leads to poor generalization.At the same time,the huge parameters and computational costs of deepfake detectors bring low detection efficiency.In order to address the above limitations,this thesis is mainly devoted to studying transformer-based deepfake detection methods.The main work of this thesis is as follows:1)Aiming at the insufficient detection efficiency,a robust and lightweight deepfake detection framework using transformers is proposed to achieve efficient and accurate detection.For the problem of coarse-grained local and global learning,a robust transformer module is designed to learn fine-grained local and global features by focusing on intra-patch locally-enhanced relations and inter-patch locally-enhanced global relationships in face images.In order to eliminate noises yielded by the transformer when modeling global dependency,a learnable position importance matrix is introduced into the attention map to model locally-enhanced global relations.For the imperfection of transformer attention mechanism,a plug-and-play spatial attention scaling technique is designed to focus on salient global features,and vice versa,which can be incorporated into any transformer-based model to improve representational ability without increasing computational complexity.Finally,the number of parameters and computational overheads evaluation,and experimental results on several benchmark datasets demonstrate the efficiency and robustness of our models.2)Aiming at how to improve the generalization ability of deepfake detectors,a deepfake detection framework based on distilled transformers is proposed to achieve powerful generalization detection.To solve the problem that transformers treat each attention head equally,a multi-head attention scaling method is designed to adaptively select the attention map and model various global dependencies,which can be introduced into any transformer-based model to improve generalization without increasing the computational cost.To tackle the problem of the limited information provided by hard tags,the concept of continuous learning is proposed to combine metric learning with selfdistillation schemes to constrain model learning,which can gradually improve model performance.The experimental results on five public deepfake datasets demonstrate the strong generalization of our model.Both critical works proposed in this thesis are verified on five publicly available benchmark datasets,and the results show that both models can achieve promising results and obtain state-of-the-art performance.Compared with the current lightweight detection model Xception in the first work,the number of parameters and computational costs are reduced by 0.3G and 7.1M respectively,and the AUC is increased by 1.1%.Compared with the current strong generalization model SBIs in the second work,the AUC is increased by 5.6% on the cross-dataset Celeb-DF.
Keywords/Search Tags:Deepfake detection, Transformer, Metric learning, Knowledge distillation, Attention mechanism
PDF Full Text Request
Related items