Font Size: a A A

Object-aware Video Synopsis For Surveillance Scenes

Posted on:2022-04-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:T RuanFull Text:PDF
GTID:1488306560990049Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Surveillance videos play an important role in many fields like public security,intelligent transportation and smart home,etc.With the explosive growth of surveillance videos,how to effectively get expected information while reducing that is redundant,is becoming an urgent problem to be solved.Towards this end,video synopsis aims at extracting the tubes(i.e.,trajectories of moving objects)in the surveillance scenes and rearranging them along the temporal axis,which achieves effective reduction and fast retrieval of a long-time video.A complete synopsis process contains several key steps such as foreground extracting,tube constructing,tube rearranging and tube stitching.However,there exist four main difficulties to be handled:(1)Online tube rearranging is difficult,which is easy to fall into the local optimum and thus cannot achieve the expected reduction rate;(2)Foreground extracting is facing challenges such as multi-scale objects capturing and low discriminative power to recognize foreground and background;(3)The video data needs to be carefully labeled,since a new video is required to be partly labeled to pre-train a deep learning-based foreground extracting model to perform expected results;(4)A complete tube is hard to be constructed.A smooth-less tracking will lead to the loss of moving pixels.To address above problems,this thesis focuses on video synopsis and achieves the high reduction rate while keeping most of the valuable information,by conducting graph coloring-based online tube rearranging,foreground extracting with multi-scale capturing and edge enhancing,pre label-free and semantic-aware foreground extracting,and accurate tube constructing with trajectory smoothing,respectively.The main contributions of this thesis can be summarized as follows:· To solve the problem of online tube rearranging,this thesis proposes an online tube rearranging method based on dynamic graph coloring.First,the relationships between tubes are modeled as a novel dynamic graph that can be iteratively updated for real-time rearranging;Second,a dynamic graph coloring algorithm is proposed to update the dynamic graph,which considers the interactions between the new coming tube and the received tubes.To avoid local minimum,the received tubes will also be adjusted when rearranging the new tube.Based on the above methods,a complete online video synopsis framework is further designed.The experimental results on 12 surveillance videos show that the proposed method preserves 98.96%of moving information while ensuring a high reduction rate.· A deep neural network model with multi-scale capturing and edge enhancing is proposed to improve the discriminative power of the video foreground extracting model and handle multi-scale objects,which contains three modules.First,for better utilizing of the background information of surveillance videos,the Background Embedding Module is designed to extract background features as priori that will be fed into subsequent modules;Second,the Multi-scale Feature Integrating Module aims to capture multi-scale object information in surveillance videos,by fusing the low-level and high-level features of the deep model;Finally,the Edge Enhancing Module is put forward to add explicit constraints on the boundaries between foreground and background to make the model more discriminative.On widely used CDNet2014 dataset for foreground extracting tasks,our model achieves an average F-Measure of 0.9552 when only 25 frames of each test video are used for training,which surpasses several state-of-the-art methods.· When a new surveillance scene coming,it is required to be partly labeled to pretrain the above deep learning-based model before put into use.To address this problem,this thesis proposes a pre label-free and semantic-aware foreground extracting method.First,we design a second-order attention-based module to improve the generalization ability of the deep model,which can dig out more valuable information from a much reliable background image.Based on that,we can still perform accurate foreground extracting on a new video,even without pre-train on it.Second,a semantic-aware network with branch information interacting is designed to solve the problem that the existing methods cannot further perceive the semantics of moving objects,and further improve the performance on binary foreground extracting.A new dataset named Semantic CDNet2014++ is proposed to support the training and testing of the above network.The semantic information generated by the network can serve video synopsis more flexibly.Our model obtains an average F-Measure of 0.8330 on the test set of the new dataset,which outperforms 0.1648 compared to the model that require pre-label operation.Moreover,we show the qualitative and quantitative results of semantic-aware video synopsis.· To solve the problem of inaccurate tube constructing by object tracking,this thesis proposes a smoothing method for accurate tube generating.First,we propose a new evaluation metric to quantify the “vibration” of the tracking results,so that we can evaluate the smoothness of the tracking trajectories.Then,to achieve accurate tube constructing,a simple Kalman filter-based vibration reducing method is designed to correct the prediction results of the tracker,leading to the simultaneous improvement of smoothness and accuracy on wide-area surveillance video data.The experimental results on public dataset demonstrate that our method can significantly remove vibrations on both horizontal and vertical directions,and further improve the tracking accuracy when the tracking target is an object in a wide-area surveillance video,which is moving smoothly.To sum up,this thesis makes a profound study on the essential problems of video synopsis.Specifically,this thesis proposes novel methods for fast and efficient online tube rearranging,reliable video foreground extracting with semantic segmenting and accurate tube constructing,which lays a foundation for the application and researching of video synopsis.A large amount of experimental analyses verify the effectiveness of the proposed methods.
Keywords/Search Tags:Surveillance Video Synopsis, Moving Object Tube Rearranging, Video Foreground Extraction, Deep Learning, Object Tracking
PDF Full Text Request
Related items