Video Prediction Based On 3D Convolution Neural Network

Posted on:2022-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:M Q Yang

Full Text:PDF

GTID:2518306353483664

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Video prediction is a kind of technology based on computer understanding of dynamic images in the video to analyze and predict the possibility of future scenes.The key task of video prediction is to learn how to predict the spatial and temporal features effectively.Due to the strong correlation between video frame sequences and the uncertainty of future frame information,the task of video prediction is extremely challenging.In this thesis,aiming at the problems of low prediction accuracy,fuzzy prediction image,unstable network training,and low authenticity of prediction image in the MCnet video prediction model based on dual stream architecture,a video prediction model based on 3D convolution is proposed.Firstly,aiming at the problem of low prediction accuracy of MCnet video prediction model,this thesis proposes to use 3D convolutional neural network to construct a motion encoder to encode the dynamic information of the video sequence.3D convolution can not only extract two-dimensional image features but also effectively fuse the motion state information of adjacent video frames.Therefore,3D convolution is suitable for video data modeling and improves the prediction accuracy.Secondly,the perceptual loss function is proposed as the optimization index of the MCnet video prediction model.The perceptual loss function can guide the prediction network to comprehensively consider the loss of low-level features(such as color,edge,etc.)and highlevel feature loss(such as content,global structure,etc.)of the real image and the predicted image,so that the predicted image is more similar to the real video frame,and more in line with the human visual sense.Finally,this thesis combines 3D convolution neural network and perceptual loss function to construct a new complete video prediction model and uses the improved adversarial training method to train the prediction model,which solves the problems of training difficulty and nonconvergence of MCnet video prediction model.Through the improved adversarial training,the performance of the network model is optimized.Moreover,the sample distribution of the generated video frame is closer to that of the real video frame,and the generated image is more similar to the real video frame,which further solves the problem of image blur.The prediction model proposed in this thesis is evaluated on the KTH dataset and UCF101 dataset,and compared with other advanced models.The results show that the predicted video frames generated by the video prediction model proposed in this thesis are closer to the real video frames,and the performance is more outstanding.

Keywords/Search Tags:

Video prediction, Deep learning, 3D convolutional neural network, Perceptual loss function, Adversarial training

PDF Full Text Request

Related items

1	The Design And Application Of Perceptual Loss Function In Deep Learning
2	Research And Development Of Image Inpainting Algorithm Based On Global And Local Perceptual Adversarial Networks
3	Research On Deep Neural Network Adversarial Samples Defense
4	Research On Image Super-resolution Based Or Generative Adversarial Network
5	Video Scene Prediction Based On Deep Learning
6	Research And Implementation Of Video Compression Based On Deep Learning
7	Research On Robustness Of Deep Learning Model Based On Adversarial Examples
8	Research And Application Of Video Prediction Algorithm Based On Deep Learning
9	Defense Against Adversarial Attacks By Reconstructing Images
10	Research On Structure Improvement And Training Optimization Algorithm Of Deep Convolutional Neural Networks