Font Size: a A A

Research On Unsupervised Video Object Segmentation Algorithm Based On Joint Spatio-temporal Features

Posted on:2022-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhuFull Text:PDF
GTID:2518306563962539Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Unsupervised video object segmentation requires the algorithm to automatically segment the most salient and significant object in the entire video sequence without giving any annotation information about the foreground object.It has attracted widespread attention from researchers due to its broad application prospects.Due to the richness of video content and the complexity of scenes,there are some problems such as occlusion,fast-moving,severe deformation,and background confusion among foreground objects in the video,which brings severe challenges to the task of accurate and stable unsupervised video object segmentation.In recent years,the Internet,big data,and 5G technology have developed rapidly,and the amount of video data has soared.Using artificial intelligence technology to automatically analyze massive amounts of video will become a mainstream trend.Since video data has rich temporal and spatial features,full and effective use of the temporal and spatial features of the video will help to deal with the challenges in the task of unsupervised video object segmentation.Therefore,the research on unsupervised video object segmentation algorithms based on joint spatio-temporal features has great research value and theoretical significance.Aiming at the problem of how to make full use of the spatio-temporal features of video data in the unsupervised video object segmentation model,this paper studies two technical routes of optical flow method and non-optical flow method respectively.The method based on optical flow focuses on two aspects:bidirectional motion cues refinement and multi-level feature aggregation strategy.For end-to-end unsupervised video object segmentation(non-optical flow method),the research focuses on fully mining the spatio-temporal correlation between video frames from the perspective of the video data itself.The main research work of this paper is summarized as follows:Firstly,an unsupervised video object segmentation algorithm based on bidirectional motion cues refinement is proposed.The single-directional optical flow cannot fully represent the motion mode of the object to be segmented,which leads to the inaccuracy of motion estimation,thus affecting the segmentation accuracy of the video object in a complex motion state.From the insight,the bidirectional optical flow feature is introduced in this paper and a Motion-cues Refine Module is proposed,which is integrated into a motion salient segmentation network.The motion cues of foreground object are fully utilized to improve the segmentation accuracy.Experiments on DAVIS-2016 dataset demonstrate the effectiveness of the proposed algorithm.The results show that,compared with the single-directional optical flow,the proposed Motion-cues Refine Module can improve the segmentation accuracy of the base network by 13.6%.Secondly,an unsupervised video object segmentation algorithm based on feature aggregation and motion refinement is proposed.To fully combine the spatio-temporal features of video sequences,by jointing the Motion-cues Refine Module proposed in the first research content,a dual-stream co-enhanced network based on multi-level feature aggregation and motion-cues refinement is proposed.The network is a dual stream network,which is composed of an appearance stream and a motion stream.Given the different contributions of different level features to segmentation performance,an appearance saliency segmentation network was designed,which included a Context Attention Module and a Multi-level Feature Aggregation Module.By integrating different level features effectively,the representation ability of appearance features for foreground object was improved.The dual-stream co-enhanced network effectively combines the appearance saliency with the refined motion saliency and improves the accuracy of the whole network segmentation in a co-enhanced process.Comparison experiments on DAVIS-2016,SegTrack-v2,and VideoSD datasets verified the effectiveness of the proposed algorithm.The segmentation accuracy on DAVIS-2016 dataset reaches 79.6%,surpassing UOVOS,FSEG,LVO,and other mainstream similar algorithms.Finally,an unsupervised video object segmentation algorithm based on adaptive spatio-temporal information selection is proposed.Since unsupervised video object segmentation based on optical flow method requires pre-calculated optical flow information,end-to-end segmentation cannot be performed.To fully explore the rich inter-frame spatio-temporal relationships in video sequences,an adaptive spatio-temporal feature selection network that can perform end-to-end segmentation without using any external auxiliary information is proposed.By constructing a memory from multiple frames in the video,the most favorable spatio-temporal features were selected adaptively from the memory to enhance the representation ability of the current frame feature.The proposed algorithm has carried out extensive ablation studies and comparison experiments on DAVIS-2016,SegTrack-v2,and VideoSD mainstream datasets,and the effectiveness of the proposed algorithm is verified.The segmentation accuracy on DAVIS-2016 dataset reaches 77.6%,which is better than most of the optical flow based methods.This thesis contains 37 figures,12 tables,and 72 references.
Keywords/Search Tags:Unsupervisied, Video Object Segmentation, Spatio-temporal features, Motion cues, Dual-stream network
PDF Full Text Request
Related items