Font Size: a A A

Research On Video Co-segmentation

Posted on:2016-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z H FuFull Text:PDF
GTID:2308330476953375Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Image segmentation has been developed rapidly in the last 20 years. Classic image segmentation algorithms utilize the low-level visual features, e.g. color distribution,boundary strength and visual saliency, to segment one image. The co-segmentation algorithms that first proposed in 2006 segment multiple images with common foreground simultaneously. The goal of co-segmentation is to improve the segmentation quality by constraining on the foerground similarity of multiple images, where the same objects share the same labels. In computer vision tasks, it is very important to cut out the target object from the video. The shapes and the change of shapes obtained from segmentation is very helpful. On the one hand, the shapes of objects provide important cues for object recognition. On the other hand, cutting the foreground region from the noisy background is more discriminative than the fixed window. Many researches show that segmentation boosts the computer vision tasks like object recognition and object tracking. How to maintain the temporal coherence of the segmentation is an important problem. Traditional video segmentation algorithms calulate the pixel correspondence of adjacent frames while preserving the labeling coherence of the corresponding pixels by motion estimation. Approaches mentioned above perform rather good when the foreground does not move very fast, while generally fail for videos with fast moving foregrounds and videos with low temporal sampling rate.This paper firsly reviewed the traditional video segmentation algorithms based on motion estimation and co-segmentation algorithms that optimize jointly while contraining the foreground similarity. This paper proposed a joint energy function defined on the whole video that contains the segmentation prior term and the temporal coherence term, where the temporal coherence term encodes the temporal coherence by the similarity on the hyper-plane in the CNNs feature space, which models both the video foreground and the video background. In the experiments, the segmentation results are compared between with and without the temporal coherence term, indicating that the proposed temporal coherence term improves the segmentation accuracy. Based on the definition of the energy function, the energy minimization is performed in iterations between searching in a shrinked space of foreground labeling and model parameter learning.For the learning of the temporal coherence term, we convert the video co-segmentation problem into a transductive learning problem. We first reviews the classic transductive learning algorithms. Considering the various appearance changes of the video foreground, a multi-component video co-segmentaion algorithm is proposed. In that algorithm, a temporal tree is constructed in the video temporal domain, where one node in the tree corresponds to one frame, and the path from the root node to the leaf is regarded as a component that the appearance changes smoothly. Each component corresponds to one changing foreground model. The relationship among the models of various components is also discussed. In the experiments, the EM-style method is compared with the temporal tree based method, showing the robustness of the multi-component model to the videos with large foreground appearance change. The proposed algorithm is also compared with recent video binary segmentation and co-segmentation. The proposed algorithm outperforms many video segmentation and co-segmentation algorithms and is extremely robust to the large motion of the foreground objects.
Keywords/Search Tags:Co-segmentation, transductive learning, convolutional neural networks
PDF Full Text Request
Related items