Font Size: a A A

Single Stage Temporal And Spatial Super Resolution Reconstruction Of Traffic Video Based On Shift Window Attention

Posted on:2024-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:H Y QinFull Text:PDF
GTID:2542307157474994Subject:Traffic and Transportation Engineering
Abstract/Summary:PDF Full Text Request
High-resolution traffic surveillance video can help people better grasp real-time road conditions.However,in the process of traffic surveillance video acquisition,transmission and storage,it is often accompanied by the problem of resolution loss,and video super-resolution reconstruction is a common method to solve this problem.This paper addresses the problems of a large number of parameters and poor reconstruction effects in existing traffic video superresolution reconstruction networks and constructs two video spatio-temporal super-resolution reconstruction models to improve the resolution of traffic scene videos.The main work of this paper is as follows:1.To address the problem that the existing video super-resolution reconstruction methods fail to make effective use of the correlation between temporal and spatial domain information,a two-stage video spatio-temporal super-resolution reconstruction model(SAM-VSSR)based on the self-attentive mechanism is constructed,which mainly consists of four modules: feature extraction,feature interpolation,feature fusion and feature reconstruction.In the feature fusion module,a self-attentive mechanism and a global temporal feature fusion method are adopted to assign new weights to the temporal and spatial dimensional information in the video frame feature sequence and reassign them,so that the features in the video frame sequence can be effectively fused,and at the same time,by aggregating the long and short term motion cues between video frames,the problem of poor reconstruction caused by the large motion between traffic video frames is solved.2.A single-stage video spatio-temporal super-resolution reconstruction model(SWAVSSR)based on shifted window attention is constructed to address the problems of complex structure and a large number of parameters of the first two-stage spatio-temporal superresolution reconstruction model SAM-VSSR.In the feature interpolation and fusion part,the Swin Transformer network based on the shifted window attention mechanism is used,and the window segmentation and shifting are performed through the calculation of attention so that the Transformer network originally used for natural language can be applied to the processing of the video.Meanwhile,this model integrates the two sub-tasks of video super-resolution reconstruction in the temporal and spatial domains into a single-stage codec network,which completes the high-resolution reconstruction of video with a smaller number of model parameters and improves computational efficiency.3.The SAM-VSSR and SWA-VSSR models were trained and tested on the public datasets Vid4 and Vimeo-90 K datasets,as well as on the traffic video datasets collected by ourselves.The test results showed that the peak signal-to-noise ratio of the SAM-VSSR model on the Vid4 dataset was 26.35 d B and the structural similarity was 0.8023,while the peak signal-to-noise ratio of the SWA-Compared to the Zooming Slow Mo model,the SAM-VSSR model showed a 0.58% improvement in structural similarity and a 19% reduction in the number of parameters.
Keywords/Search Tags:Traffic video, Video super-resolution reconstruction, Deep learning, Self-attentive mechanism, Codec network, Multi-scale features
PDF Full Text Request
Related items