Font Size: a A A

Research And Application Of Video Prediction Algorithm Based On Deep Learning

Posted on:2024-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:C M LiFull Text:PDF
GTID:2568307127953379Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As deep learning continues to evolve,significant progress has been made in video prediction research.The basic building blocks of video are image frames that can be used for prediction and learning to better understand future possibilities.Although video prediction also involves representation learning of continuous images,it differs from differentiation tasks such as video anomaly detection and video classification,which extract only key features that facilitate recognition or detection,such as object contours and texture features,without preserving details of the entire image or video.However,video prediction is a more challenging generative task that requires not only extracting various features from past continuous images,but also modeling highly complex and dynamic scenes.Currently,video prediction technology has become an important research topic in the field of computer vision,and it has a wide range of applications in anomaly detection,robot decision-making,weather forecasting,and autonomous driving.This paper investigates video prediction algorithms based on convolutional long and short-term memory networks and structurally improve from two different prediction perspectives to improve the accuracy of video prediction.Firstly,a prediction model based on convolutional long and short-term memory network and codec structure is improved.The spatial-temporal information learning module inside this network contains complex motion feature extraction units and differential gate operations,which effectively enhances the model’s characterization of internal motion changes of objects in future scenes,and a self-attention mechanism module is innovatively designed to improve the network’s ability to capture long-term dependencies of spatial-temporal dynamics in multi-frame prediction.The experiments are conducted on domestic and international driving scene datasets(D~2-City and BDD100K),and the experimental results show that the algorithm has better structural prediction performance compared to a single convolutional long and short-term memory variant network model.Secondly,to address the inefficiency problems arising from the above prediction structure in capturing short-term motion changes,an improved approach models future frame prediction as two branching networks,effectively guides the information capture of short-term motion change features by prediction learning from the differences of adjacent image frames in the backbone prediction network,designs a differential attention mechanism DAM(Differential Attention Mechanism),which guides the convolutional encoder to reasonably allocate attention resources according to the intensity of instantaneous motion at different locations,this module aims to alleviate the problem of motion detail loss caused by image coding,and innovatively introduces a 3D convolutional neural network as a sequence frame discriminator in the generation confrontation process,which ensures the joint spatial-temporal distribution of data between the predicted sequence frames and the real sequence frames.Experiments are conducted on both KITTI driving dataset,Caltech pedestrian dataset and UCF-101 action recognition dataset,and the experimental results show that the proposed generative motion-assisted discrimination network achieves state-of-the-art prediction results on several datasets compared with the current mainstream prediction algorithms,providing a general basic framework for subsequent related video processing tasks.Finally,this paper also extends the network structure of the second of video prediction model is extended to perform the anomaly detection task of video frames by video prediction on the CUHK Avenue pedestrian anomaly behavior dataset,and the proposed algorithm shows an improvement in detection accuracy compared with other related video anomaly detection methods.The experiments show that the continuous image features extracted on the video prediction task are highly generalizable.
Keywords/Search Tags:video prediction, deep learning, convolutional long and short-term network, attention mechanism, generative adversarial
PDF Full Text Request
Related items