Font Size: a A A

Video Scene Prediction Based On Deep Learning

Posted on:2022-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z X LiFull Text:PDF
GTID:2518306353984509Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the improvement of computing power,the ability of computers in vision tasks has been improved,and many applications in the field of computer vision have made great breakthroughs.Among various tasks oriented to computer vision,video prediction tasks have received widespread attention in recent years because they do not require human annotation data and there are a large amount of video data available in life.Video prediction aims to allow the model to automatically generate future image frames by learning a series of previous frames.However,compared with images,the video includes not only spatial dependence but also temporal dependence,which makes video prediction tasks extremely challenging.At this stage,although the research on video prediction has gradually shifted from the method of focusing on pixel rules to the method of focusing on motion information,most research results,especially for long-term video prediction,usually still generate unclear future frames,blurry images,and lack of local details in future frames,especially for long-term video prediction.Aiming at the task of video prediction,the paper first studies and analyzes the current mainstream prediction algorithms.After that,the paper discusses a Spatio-temporal feature decomposition video prediction network,MCnet,and explores its related prediction principles and some existing problems.Based on the MCnet network,This paper proposes an improved video prediction algorithm based on deep Spatio-temporal features,and the main contributions are as follows: 1)Aiming at the problems of gradient disappearance,gradient explosion and mode collapse in traditional generative adversarial networks,the paper introduces a new GAN framework WGAN-gp to solve the problems of traditional GAN and improve the convergence speed of the network.2)Aiming at the problem of insufficient motion prediction ability of MCnet network,this paper proposes a motion decoder,which strengthens the motion encoder's ability to predict motion by introducing motion loss.3)In order to further strengthen the prediction ability of the prediction algorithm for motion,the paper proposes frame difference loss,by calculating the frame difference between the predicted frames and the frame difference between the real future frames,to strengthen the entire network's ability to encode and predict motion.4)Considering the problem of MCnet's poor ability to predict image edges and details,and the problem of fuzzy prediction of MSE loss,this paper proposes feature loss,and adds an edge extraction network HED network,which uses the edge features and shallow content features inside the HED network as feature loss to strengthen the entire network's ability to predict image edges and details and improve the quality of prediction.Finally,the paper evaluates the improved algorithm and current typical video prediction algorithms on the KTH data set,UCF101 data set and KITTI data set,and uses PSNR and SSIM as the evaluation criteria.The experimental results show that the video prediction algorithm based on deep Spatio-temporal features is better than the existing prediction algorithms on each data set.Compared with other algorithms,the prediction algorithm in this paper has more advantages in predicting ability,especially for motion prediction.In addition,in terms of imaging quality,the algorithm in this paper has also been greatly improved,with better performance.
Keywords/Search Tags:deep learning, video prediction, generative adversarial network, spatiotemporal features, compound loss
PDF Full Text Request
Related items