Font Size: a A A

Semi-supervised Video Object Segmentation For Diverse Scenes

Posted on:2020-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2428330590958260Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As the basis of scene understanding,video object segmentation is of great significance for multiple fields in computer vision,ranging from action recognition and self-driving cars,to object tracking.Given a manual labeling of first frame,semi-supervised video object segmentation methods automatically segments the objects in subsequent frames,which requires less manual intervention.In recent years,the development of convolutional neural networks and the emergence of large data sets have significantly improved the accuracy of video object segmentation.However,most state-of-the-art segmentation approaches rely on the online fine-tuning,which adds extra time cost.Video scenes are diverse.On the one hand,the appearance and scale of the object changes with time.On the other hand,when there are multiple similar objects,it is easy to cause segmentation error.In addition,the algorithm may miss the object when it is occluded or temporarily out of view.This paper focuses on semi-supervised video object segmentation.In view of the difficulties in this field and the shortcomings of existing methods,two different ideas are proposed.A multi-feature guided method is explored for large changes of object and similar objects.The object appearance characteristics of the first frame,the previous frame,and the current frame is used to construct global and local appearance matching features to capture object appearance changes across the video.Meanwhile,pixel offset to the object center is estimated to construct an object center map,which distinguishes the objects at different positions,especially similar objects.The above three kinds of information are combined with the backbone features obtained by the feature extraction network to guide the model to achieve accurate video target segmentation.Another anti-missing video object segmentation method is proposed to overcome the issue that existing methods relies on the previous mask so that the object is lost in some scenes.The instance segmentation model Mask R-CNN is extended to the video object segmentation task.Firstly,the Mask R-CNN is adjusted to be category agnostic by mapping the N classes to a single foreground class for segmenting generic objects,and the output of which are considered as candidate objects.Secondly,an additional branch is added to the Mask R-CNN to extract the 256-dimensional feature vector of the candidate objects.Finally,the object matching and template update strategies are designed to establish the temporal correlation between frames.By calculating feature vectors similarity between the candidate object and the template,the object that is most similar to the template is found in the current video frame and the mask is obtained.The specified object can be segmented by simply giving its bounding box in the first frame instead of the precise mask annotation.The method segments each frame independently,which is not affected by the previous segmentation result.Above two approaches are evaluated on the DAVIS dataset.Compared with the methods that do not require online fine-tuning,the proposed methods have the highest accuracy.Compared with the time-consuming online fine-tuning methods,the two approaches have a better trade-off between accuracy and speed,and effectively solve the difficulty of video object segmentation in diverse scenes.
Keywords/Search Tags:Video object segmentation, Diverse scenes, Convolutional neural network, Appearance change, Object missing
PDF Full Text Request
Related items