Font Size: a A A

Video Prediction Based On 3D Convolution Neural Network

Posted on:2022-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:M Q YangFull Text:PDF
GTID:2518306353483664Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video prediction is a kind of technology based on computer understanding of dynamic images in the video to analyze and predict the possibility of future scenes.The key task of video prediction is to learn how to predict the spatial and temporal features effectively.Due to the strong correlation between video frame sequences and the uncertainty of future frame information,the task of video prediction is extremely challenging.In this thesis,aiming at the problems of low prediction accuracy,fuzzy prediction image,unstable network training,and low authenticity of prediction image in the MCnet video prediction model based on dual stream architecture,a video prediction model based on 3D convolution is proposed.Firstly,aiming at the problem of low prediction accuracy of MCnet video prediction model,this thesis proposes to use 3D convolutional neural network to construct a motion encoder to encode the dynamic information of the video sequence.3D convolution can not only extract two-dimensional image features but also effectively fuse the motion state information of adjacent video frames.Therefore,3D convolution is suitable for video data modeling and improves the prediction accuracy.Secondly,the perceptual loss function is proposed as the optimization index of the MCnet video prediction model.The perceptual loss function can guide the prediction network to comprehensively consider the loss of low-level features(such as color,edge,etc.)and highlevel feature loss(such as content,global structure,etc.)of the real image and the predicted image,so that the predicted image is more similar to the real video frame,and more in line with the human visual sense.Finally,this thesis combines 3D convolution neural network and perceptual loss function to construct a new complete video prediction model and uses the improved adversarial training method to train the prediction model,which solves the problems of training difficulty and nonconvergence of MCnet video prediction model.Through the improved adversarial training,the performance of the network model is optimized.Moreover,the sample distribution of the generated video frame is closer to that of the real video frame,and the generated image is more similar to the real video frame,which further solves the problem of image blur.The prediction model proposed in this thesis is evaluated on the KTH dataset and UCF101 dataset,and compared with other advanced models.The results show that the predicted video frames generated by the video prediction model proposed in this thesis are closer to the real video frames,and the performance is more outstanding.
Keywords/Search Tags:Video prediction, Deep learning, 3D convolutional neural network, Perceptual loss function, Adversarial training
PDF Full Text Request
Related items