Video Scene Parsing Based On Spatio-temporal Saliency

Posted on:2017-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wu

Full Text:PDF

GTID:2518304868469254

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Scene parsing is a vital step in understanding images and videos.As a hot spot in computer vision,scene parsing has a wide application prospect in many fields,such as automatic vehicle navigation,remote sensing,automatic recognition of landforms and image retrieval,and so on.Generally speaking,scene understanding involves two parts: scene content extraction and scene content recognition.In this paper,we investigate that how to model spatio-temporal saliency to extract salient regions in scenes.Then based on spatio-temporal saliency model,we focus on how to improve nonparametric scene parsing algorithm for hierarchical parsing in video scenes.The main work is summarized as follows:1)In existing spatio-temporal saliency models,motion description is too simple to contain plentiful motion information,or is too complex as to require more time consuming.This paper proposes a spatio-temporal saliency model based slow feature analysis.Firstly,all cuboids collected from different video sequences are mixed to learn slow feature functions.Then,we exploit two-layer slow feature functions to extract pixel-level high-level motion features for temporal saliency computation.Finally,we combine obtained temporal saliency map and a spatial saliency based on boolean map to generate final spatio-temporal saliency map.Experimental results on video sequences dataset JPEGS show the two-layer slow feature transformation effectively make salient regions �pop out�.2)In general nonparametric image scene parsing algorithm,image retrieval based on hand-crafted features produces relatively low pixel recognition,and superpixel classification based on KNN tends to ignore less-represented classes as to obtain low class recognition.In this paper,we propose an improved scene parsing algorithm.We firstly use deep learning framework to extract CNN features instead of traditional hand-crafted features(gist and spatial pyramid)for image retrieval of similar scenes.Then,we combine KNN and ensemble classifier techniques,and adjust KNN classification costs through merging likelihood scores from different probabilistic classifiers.Experimental results on two public datasets SIFTflow and LMSun show CNN features and ensemble classifier techniques boost overall performance,and CNN features are superior to traditional hand-crafted features in scene images.3)Traditional nonparametric scene parsing algorithm produces inaccurate likelihood scores in less-represented classes.To address this problem,we propose a video scene parsing method based regions of interest.Firstly,we exploit spatio-temporal saliency model to obtain salient regions of a video frame.Based on obtained salient regions,all superpixels are divided into foreground and background.Then we compute likelihood scores for foreground and background superpixels respectively.Considering the correlation between video frames,we combine superpixels likelihood scores of two adjacent frames to generate final results.Experimental results on standard dataset Cam Vid show that our method is superior to traditional nonparametric scene parsing algorithm and improves overall recognition performance.

Keywords/Search Tags:

slow feature analysis, spatio-temporal saliency, CNN features, ensemble classifier techniques, scene parsing

PDF Full Text Request

Related items

1	Research On Visual Saliency Detection Based On Spatio-temporal Features
2	Research On Surveillance Video Synopsis Based On Spatio-Temporal Slice
3	Probabilistic Shape Parsing and Action Recognition Through Binary Spatio-Temporal Feature Description
4	Research On Methods Of Video Content Analysis Based On Spatio-temporal Variation
5	Research Of Mean Shift Target Tracking Algorithm Based On Spatio-temporal Visual Saliency Features
6	Research On Spatio-Temporal Indexing Mechanism And Querying Strategy
7	Research On Semantic Scene Parsing
8	Research On The Approach Of Human Action Recognition Based On Spatio-temporal Features
9	Research On The Extraction Of Salient Object Based On Spatio-Temporal Analysis
10	Research On The Local Spatio-Temporal Relationships Based Feature Model For Action Recognition