Font Size: a A A

Video Scene Parsing Based On Spatio-temporal Saliency

Posted on:2017-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y WuFull Text:PDF
GTID:2518304868469254Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Scene parsing is a vital step in understanding images and videos.As a hot spot in computer vision,scene parsing has a wide application prospect in many fields,such as automatic vehicle navigation,remote sensing,automatic recognition of landforms and image retrieval,and so on.Generally speaking,scene understanding involves two parts: scene content extraction and scene content recognition.In this paper,we investigate that how to model spatio-temporal saliency to extract salient regions in scenes.Then based on spatio-temporal saliency model,we focus on how to improve nonparametric scene parsing algorithm for hierarchical parsing in video scenes.The main work is summarized as follows:1)In existing spatio-temporal saliency models,motion description is too simple to contain plentiful motion information,or is too complex as to require more time consuming.This paper proposes a spatio-temporal saliency model based slow feature analysis.Firstly,all cuboids collected from different video sequences are mixed to learn slow feature functions.Then,we exploit two-layer slow feature functions to extract pixel-level high-level motion features for temporal saliency computation.Finally,we combine obtained temporal saliency map and a spatial saliency based on boolean map to generate final spatio-temporal saliency map.Experimental results on video sequences dataset JPEGS show the two-layer slow feature transformation effectively make salient regions “pop out”.2)In general nonparametric image scene parsing algorithm,image retrieval based on hand-crafted features produces relatively low pixel recognition,and superpixel classification based on KNN tends to ignore less-represented classes as to obtain low class recognition.In this paper,we propose an improved scene parsing algorithm.We firstly use deep learning framework to extract CNN features instead of traditional hand-crafted features(gist and spatial pyramid)for image retrieval of similar scenes.Then,we combine KNN and ensemble classifier techniques,and adjust KNN classification costs through merging likelihood scores from different probabilistic classifiers.Experimental results on two public datasets SIFTflow and LMSun show CNN features and ensemble classifier techniques boost overall performance,and CNN features are superior to traditional hand-crafted features in scene images.3)Traditional nonparametric scene parsing algorithm produces inaccurate likelihood scores in less-represented classes.To address this problem,we propose a video scene parsing method based regions of interest.Firstly,we exploit spatio-temporal saliency model to obtain salient regions of a video frame.Based on obtained salient regions,all superpixels are divided into foreground and background.Then we compute likelihood scores for foreground and background superpixels respectively.Considering the correlation between video frames,we combine superpixels likelihood scores of two adjacent frames to generate final results.Experimental results on standard dataset Cam Vid show that our method is superior to traditional nonparametric scene parsing algorithm and improves overall recognition performance.
Keywords/Search Tags:slow feature analysis, spatio-temporal saliency, CNN features, ensemble classifier techniques, scene parsing
PDF Full Text Request
Related items