Research On Video Prediction Based On Optical Flow Characteristics

Posted on:2024-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y X Dai

Full Text:PDF

GTID:2568307106986279

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

With the rapid development of the mobile internet,the volume of video data is increasing exponentially.By modeling massive video data and understanding its spatiotemporal structure characteristics without additional manual annotation,it has high research and practical application value for several scenarios such as weather forecasting,autonomous driving,and intelligent transportation systems.The core task of video prediction is to achieve refined prediction of future trends by observing the mapping relationship between historical data and future data in space.Currently,most models do not sufficiently extract temporal information between video frames and pay insufficient attention to the motion foreground,resulting in the occurrence of fuzzy prediction and artifacts.In this thesis,optical flow features are added to convolutional long-term and short-term memory networks to achieve better video prediction results.The main research work is as follows:(1)Aiming at the problem of insufficient extraction of motion information in video prediction networks,a new encoding and decoding network based on convolutional long and short time memory networks is proposed.Jump connection structures and optical flow modules are added to supplement the motion details of images,and a video prediction model based on optical flow characteristics,called FED-Conv LSTM,is constructed.Experiments have confirmed that the cascade integration of the RAFT optical flow module and the FED-Conv LSTM network has achieved good prediction results on three global public datasets,Moving MNIST,KTH,and Human3.6,with structural similarity metrics of 0.928,0.892,and 0.878,respectively.(2)Aiming at the insufficient attention of video prediction networks to the motion foreground,this thesis proposes an optical flow pyramid video prediction model based on foreground attention,called PFED-Conv LSTM.The proposed PFED-Conv LSTM model not only captures global information at large scales,but also captures image details at small scales by constructing a Gaussian pyramid for optical flow images.In addition,a foreground attention mechanism is designed,in which pooling operations strengthen the foreground area and suppress the background area,and void convolution operations expand the receptive field.The foreground attention module allocates attention resources reasonably from both spatial and channel dimensions.The structural similarity of PFED-Conv LSTM on Moving MNIST,KTH,and Human3.6reached 0.933,0.903,and 0.890,respectively,further improving the prediction performance.(3)Based on a large number of comparative experiments with existing classical video prediction models,it is confirmed that the prediction effect of the proposed model is better than the existing classical models,and the overall performance is more competitive.

Keywords/Search Tags:

Video Prediction, Convolutional Long and Short Term Memory Network, Optical Flow, Gaussian pyramid, Foreground Attention Mechanism

PDF Full Text Request

Related items

1	Research On Intrusion Detection Technology Based On Convolutional Autoencoder And Long Short Term Memory Network
2	Research On Network Intrusion Detection Method Based On Bi-LSTM
3	Regional Short-term Load Forecasting Model Combining Attention Mechanism And Deep Neural Network
4	Research On Prediction Method For Mixed Frequency Data Based On Long Short-Term Memory Network
5	Research On Image Caption Via Incorporating Attention And Long Short-Term Memory Network
6	End-to-end Modeling Method Of Action Prediction
7	Application Of Short Term And Long Term Memory Neural Network In Stock Trend Prediction
8	Research On Air Quality Prediction Based On Deep Learning
9	Research On The Violent Detection Of Audio And Video Based On Attention Mechanism
10	Orietation And Position Prediction Of Free-flying Moving Objects Based On Long Short-term Memory Network