Font Size: a A A

Deep Learning-based Video Segmentation Via Multiple Granularity Analysis

Posted on:2019-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:R YangFull Text:PDF
GTID:2428330590492341Subject:Major in Electronic and Communication Engineering
Abstract/Summary:PDF Full Text Request
Video segmentation aims at separating target objects of interest from noisy background,and has received considerable attention with a wide range of computer vision applications,such as 3D reconstruction,video summarization,etc.Numerous algorithms have been proposed during the past decade with focus on developing graphical models,e.g.,Markov Random Field?MRF?,and Conditional Random Field?CRF?,to estimate target motions for each pixel?optical flow?or superpixel.Despite their favorable performance in several datasets,video segmentation still faces two main challenges.First,when graphical models are leveraged to compute tempo-ral consistency in the pixel or superpixel level,there often exist mismatching pairs between consecutive frames.For example,the supervoxel algorithm models the temporal consistency using superpixels for each frame.The inaccuracy caused by the mismatching of superpixels is inevitably aggregated frame by frame,and finally leads video segmentation algorithms to fail.We also note that developing a superpixel model across several frames is computationally ineffi-cient.Second,object level motions estimated by visual tracking algorithms often contain noisy background as tracking results in the form of bounding boxes are not tightly around target ob-jects.Video segmentation benefits little from the recent progress of visual tracking algorithms.To address these challenges,we present a novel framework of applying the multiple in-stance learning?MIL?algorithm to both spatial and temporal domains for video segmentation.In contrast to most machine learning algorithms that assign every training instance with a label,MIL assigns bags of instances with labels.In the binary case,a bag is labeled positive if at least one instance in that bag is positive,and the bag is labeled negative if all the instances in it are negative.MIL is able to classify instances with missing or noisy labels based on the labeled bags as training data.This motivates us to apply the MIL algorithm to compute the temporal consis-tency in the temporal domain.For example,temporal adjacent and similar superpixels always belongs to the same label?i.e,foreground or background?,since motion between consecutive frames can not be too significant.On the other hand,object level motions estimated by visual tracking algorithms in the form of bounding boxes provide rich information for the video seg-mentation task despite partial noisy background inside bounding boxes.Built on state-of-the-art tracking algorithms,we properly enlarge the tracked bounding boxes to meet the requirement of applying MIL.We find that MIL deals with the noisy background well and provides an accurate envelop of the true foreground object masks.This significantly facilitates video segmentation.We can regard the proposed method as a multi-granularity framework for video segmen-tation problem which can effectively segment target objects from the background in a coarse to fine fashion.In the coarsest level?object?,off-the-shelf object tracker is applied to the whole video sequence,yielding a candidate volume of object bounding boxes.In the middle level?superpixel?,we perform multiple instance learning within the candidate volume to obtain a coarse segmentation result.In the finest level?pixel?,segmentation mask is further refined via graph cut like algorithm.We comprehensively evaluate our algorithm on two popular video segmentation datasets,the Segtrack 2.0[2]and Davis Dataset[1]released in CVPR 2016.The results demonstrate the superiority of our video segmentation method over the state-of-the-art algorithms.
Keywords/Search Tags:Video Segmentation, MIL, Deep Learning, Multiple Granularity Analysis
PDF Full Text Request
Related items