Font Size: a A A

Research On Video Generation And Forgery Detection Technology Based On Deep Learning

Posted on:2023-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:J XuFull Text:PDF
GTID:2568306902457034Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
Forged visual content synthesized by deep learning has become a great threat to cyberspace security in recent years.By leveraging deep forged videos,attackers could spread misinformation,extort money,and mislead the public views.To alleviate the threats,forgery detection technology has become an increasingly urgent demand.Among all deepfake videos,the most harmful type is human-centric video forgery(e.g.,fake videos of celebrities conducting sensitive speeches and behaviors),which is also the focus of this thesis.Despite the great efforts dedicated in the research community,there are still two problems remaining.On one hand,current fake video detection models tend to directly adopt the model architecture developed for semantic classification,and lack in-depth understanding of the structure design for forgery detection.On the other hand,existing works and forensic datasets mainly focus on the editing detection based on existing videos(such as face swapping and face reenactment,which leads to forgery traces that are easy to detect),while ignoring the whole video generation from scratch that does not require an existing video.In response to the above problems,this thesis carries out research in two aspects.First,this thesis explores high-quality video content generation.Compared with methods based on local tampering of existing videos,video generation is an emerging form of deepfake and also a potential attack towards existing detection methods.Since existing works mainly focus on the detection of editorial tampering of existing videos,the detection against video generation has not been fully explored.This thesis mainly targets for the motion sequence generation.Motivated by studies in linguistics,we decompose the body motions into two complementary parts:pose modes and rhythmic dynamics.Accordingly,we introduce a novel freeform motion generation model by equipping a two-stream architecture,i.e.,a pose mode branch for primary posture generation,and a rhythmic motion branch for rhythmic dynamics synthesis.On one hand,diverse pose modes are generated by conditional sampling in a latent space,guided by speech semantics.On the other hand,rhythmic dynamics are synced with the speech prosody.Extensive comparisons on two datasets with existing methods demonstrate the ability of the proposed method to generate highly diverse and plausible body motions.By combining with existing works on video generation,we further introduce an end-toend video generation approach.Given audio as input,target videos can be generated without requiring an existing video.Second,this thesis further explores the architecture design for forgery video detection,including existing mature and new forged videos.By studying the essential ingredients for forgery detection models and profiling the best-performing architectures,we propose an efficient model for detection of both video tampering and video generation.Using the weight-sharing technique in neural architecture search(NAS),we conduct a thorough analysis on a massive number of detection models,and observe how their performances are affected by different structure patterns.Key findings include:1)operations in shallow layers are critical,and deserve more computational capacities;2)the ability to capture large-scope information is desirable(e.g.,large kernel sizes,wide connections),especially in shallow layers.These findings sketch a rough profile for good models for fake image detection,which are very different from those for standard semantic classification.Based on our analysis,we propose a new Depth-Separable Search Space(DSS)for fake video detection.Compared with the state-of-the-art methods,our models achieve comparable performance while saving more than 75%parameters,outperforming existing models for detection of both video tampering and video generation.To sum up,this thesis conducts in-depth research on deep video generation and detection technologies that are inherently adversarial to each other.On one hand,we explore high-quality video generation and its technical characteristics,demonstrating the potential threats to cyberspace security.On the other hand,we design corresponding detection models for video forensics.This thesis fills the gaps of existing works in architecture ingredients design and new forgery analysis,providing insights for better detection of forged videos.
Keywords/Search Tags:Cognition Security, Video Generation and Forensics, Deepfake, Generative Model, Visual Classification
PDF Full Text Request
Related items