Research On Road Scene Video Frame Prediction Method Based On Visual And Semantic Two-Stream Fusion

Posted on:2024-07-30

Degree:Master

Type:Thesis

Country:China

Candidate:Y Zhang

Full Text:PDF

GTID:2542307157969089

Subject:Traffic and Transportation Engineering

Abstract/Summary:

PDF Full Text Request

Video frame prediction is a pixel-level prediction task based on deep learning algorithm.It predicts and generates future frames by observing a series of input frames.Due to the need to solve the problem of complex texture appearance and time-varying motion of video frames,the task becomes very complicated.However,because the video frame prediction task can solve the problem of scene and action prediction,the video frame prediction model has a broad application prospect in robot system,automatic driving and video surveillance.According to the number of video frames predicted,the video frame prediction task is divided into single frame prediction task and multi-frame prediction task.In this paper,a semantic fusion two-stream prediction model is designed to accomplish these two tasks.The model includes one semantic auxiliary path and two predictive paths.The input of semantic assistance path is semantic maps,which firstly extracts semantic features through the CNNs,and then explores the spatio-temporal relationship between different semantic features through the GCNs.Finally,this feature information is used to help prediction.The semantic path introduces the scene information in the image to the model,thus helping the model to better understand the scene.The input of the two prediction paths of the model are optical flow maps and original images respectively.The CNN is built to encode optical flow and RGB images to capture the spatial-temporal features contained in the video frames,and then the Conv LSTM and convolutional decoder are used to predict future video frames.Finally,the fusion module is used to fuse the output video frames of the two prediction paths.This two-stage approach helps to produce better predictions.The multi-frame prediction model makes subtle adjustments to the single-frame prediction model,and proposes a Conv LSTM network with attention mechanism for prediction,so that the input can be modeled for a longer time.In this paper,single-frame prediction and multi-frame prediction experiment were carried out on Caltech traffic dataset,and two parts of comparison experiment were set up,namely the ablation experiment and the comparison experiment with previous algorithms.The results of the ablation experiment show that the semantic fusion strategy and the dual-path fusion strategy in the model can effectively improve the quality of image prediction.In comparison with previous algorithms,the model in this paper can predict better and sharp images,and exceeds most of the model algorithms in the evaluation metrics of PSNR,SSIM and LPIPS.

Keywords/Search Tags:

Deep learning, Video Frame Prediction, Two-stream model, Semantic fusion, Caltech datasets

PDF Full Text Request

Related items

1	Research On Key Technologies Of Semantic Understanding Of Traffic Scene Based On Deep Representation Learning
2	Traffic Video Frame Prediction And Application Based On Deep Multi-branch Mask Network
3	Object Detecton And Recognition Methods On Aerial Images From Low-altitude UAV Based On Deep Learning
4	Research On Image Semantic Segmentation Based On Deep Learning
5	Research On Vehicle Recognition In Video Streaming Environment Based On Deep Learning
6	Research Of Sonar Image Target Recognition Technology Based On Deep Learning
7	Research On Key Technologies Of Semantic Segmentation In Road Scene Based On Deep Learning
8	The Research On Road Scene Semantic Segmentation Method Based On Deep Learning
9	Prediction Of Passenger Travel Mode Based On Transfer Learning And Deep Structured Semantic Models
10	Research On Categorized Car-following State Prediction Model Based On Deep Convolutional Fusion