Font Size: a A A

Research On Video Prediction With Noise

Posted on:2024-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhangFull Text:PDF
GTID:2568307103974679Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video prediction is a representative task in computer vision,which aims to predict the approximate estimation of the future and output the predicted frames.It has been applied in a wide range of research areas,such as precipitation forecasting,traffic flow prediction,autonomous driving and robotic control.However,existing video prediction methods assume that the input video is lossless and doesn’t contain any noise.In practical application scenarios,video prediction will be interfered by various types of noise,such as occlusion noise and adversarial noise,affecting the prediction results of the model.At the same time,existing video prediction algorithms still have problems with narrow spatiotemporal receptive fields and loss of texture features in depth network models.To solve the above problems,this article focuses on video prediction algorithms under noise interference.The main contributions of this article are described as follows:To tackle the problem of narrow spatiotemporal receptive field and occlusion noise affecting spatiotemporal continuity,this paper explores Occlusion Video Prediction for the first time and proposes an Fast Fourier Inception Network for occlude video prediction(FFINet).According to the spectral convolution theorem in Fourier theory(Changes to a single value in the frequency domain can affect the overall value),the global spatiotemporal receptive field is obtained with a few parameters;Based on this,this paper designs a occlusion video prediction network based on Fast Fourier Transform to solve Problem(2).An inpainter which is designed to restore occluded parts and a reduction loss are proposed to mitigate the interference of occluded noise on video prediction tasks.To alleviate the problem of losing texture features in depth network models and explore the feasibility of positive occlusion noise,this paper implements positive excitation occlusion noise and proposes a Video Prediciton method based on Symmetric Layer Attention mechanism(PLA-VP).From the perspective of the symmetry of video prediction models,this paper proposes a symmetric layer attention mechanism.The attention mechanism uses shallow texture features to supplement the corresponding deep semantic features,enriching the texture details in the predicted video frames.In addition,this article reexamines the impact of occlusive noise on video prediction,and proposes a visual mask pre-training method.By adding a mask to the input video during the pre-training stage,and then training the encoder and decoder to restore it,the task difficulty of the encoder is artificially increased,thereby prompting the encoder to extract more effective spatial features from limited images.To handle the the security issues of video prediction algorithms,this paper explores the security issues of video prediction algorithms.The impact of adversarial noise on video prediction tasks was studied,and the impact of different adversarial attack methods on video prediction models with different network structures was tested.It was revealed that the gradient vanishing problem is the source of robustness of recurrent neural networks.Further,by considering the characteristics of video prediction task,a motion-aware attack(MAA)algorithm is proposed,which utilizes optical flow to enable adversarial noise attacks against models from the perspective of temporal characteristics.While effectively reducing the prediction effect of convolutional neural networks,it enhances the attack effect against recurrent neural networks.For the research work(1)and(2),a large number of quantitative and qualitative experiments and ablation experiments were conducted on the video prediction dataset Moving MNIST,Taxibj,Human3.6M,KTH,and KITTI&Caltech.Based on metrics such as MSE,MAE,and SSIM,experimental results show that the proposed FFINET and PLA-VP methods have better performance than existing methods.The fast Fourier Inception module can expand the spatiotemporal receptive field of the model,and the Inpainter and reduction loss can also well restore video frames with occlusion;The combination of visual mask pre training and symmetric layer attention mechanism can make predicted video frames contain more detailed information.For research work(3),a large number of quantitative and qualitative experiments as well as ablation experiments were conducted on the dataset Moving MNIST and Human3.6M.Experimental results show that the proposed MAA attack algorithm can produce strong attack effects on various types of video prediction networks.
Keywords/Search Tags:Video Prediction, Occlusion Noise, Adversarial Attack Noise, Convolutional Neural Network, Spatiotemporal Learning
PDF Full Text Request
Related items