Font Size: a A A

Research On Unsupervised Video Multi-object Segmentation Algorithm

Posted on:2022-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:X XuFull Text:PDF
GTID:2518306533979609Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Video object segmentation is an important basic task in computer vision,with many application scenarios,such as video surveillance,autonomous driving and motion recognition.Video object segmentation can also provide a technical basis for other tasks in the computer vision field such as video summarization,scene understanding,pose estimation,behavior recognition,and video retrieval.Since the development of deep learning,huge breakthroughs have been made in the field of computer vision.The network model based on convolutional neural networks has reached impressive performance in multiple tasks in the field of vision.The video object segmentation framework based on deep learning brings significant performance improvement to video object segmentation,and it has become the current mainstream and also the most advanced solution.At present,the mainstream video object segmentation method is semi-supervised video object segmentation method that gives the ground-truth of the first frame of the video in the test phase.Due to the need for prior labeling,semi-supervised methods cannot be applied to unconstrained usage scenarios such as video surveillance.Therefore,unsupervised methods that do not require any pre-labeling have broader application scenarios and research significance.In this paper,the unsupervised learning video multi-object segmentation algorithm has been researched.The following is the research content of this article:(1)Aiming at the problem that the data of video object segmentation is difficult to obtain and the training cost of the video segmentation network is high,an unsupervised video multi-object segmentation method using motion detection and object proposal is proposed.Salient object detection and object proposal are used to extract motion information and general object information,respectively.A deep learning-based fusion network is proposed to fuse motion information with general object information.In order to avoid the unreliability of the single frame segmentation result,a forward propagation refinement module is used to propagate the temporal information to the current frame to refine the segmentation result of the current frame.In the task of unsupervised video multi-object segmentation,a conflict processing mechanism is proposed to solve the problem of attribution of pixel labels in the occluded area when multiple objects in the video are occluded by each other.In order to verify the effectiveness of this algorithm,ablation experiments of related modules and comparison experiments with current advanced algorithms are carried out on the DAVIS-16 dataset and DAVIS-17 dataset,showing the effectiveness of each module and the good performance of the algorithm.(2)Aiming at the problem that in a short time series or even a single frame,the object presents different appearances due to human body posture changes,occlusion,camera movement,etc.An unsupervised video multi-object segmentation method with joint attention mechanism is proposed.Based on the feature extraction of video frames,this method proposes a joint attention module to mine the correlations between different frames of the same video,and use the global consistency information of the video to guide the segmentation.The joint attention module includes a soft attention unit and an attention shift unit.The former emphasizes important information in a frame's features,and the latter enhances the features of the current frame by calculating the correlations between different features.At the same time,in order to exchange information in different frames more comprehensively,stacking the joint attention module can achieve better performance.The method has been tested on DAVIS-16,DAVIS-17 and FBMS datasets.The good performance on single-object and multi-object video segmentation tasks shows that the joint attention module can pass the global consistency information in the video.
Keywords/Search Tags:video object segmentation, salient object detection, object proposals, attention mechanism, optical flow
PDF Full Text Request
Related items