Font Size: a A A

Research On Video Multi-object Segmentation Algorithm Based On Multi-temporal And Multi-level Attention Network

Posted on:2022-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2518306563977469Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video object segmentation(VOS)is a popular research area in the computer vision community with various applications including dense object tracking in video understand-ing,video editing,video summarization,autonomous driving,etc.However,there are still many difficulties and challenges in practical applications.The first difficulty is the tar-get changes with the scene lead to frequent occlusions and scale variation.Then there is a great degree of similarity in the appearance between the target and the background.Finally,it is difficult for the existing algorithms to achieve high speed in practical appli-cations,especially for multiple objects.To overcome this limitation,this thesis develops an efficient and fully end-to-end model to achieve fast and accurate VOS,named Multi-Object Segmentation via Multi-Temporal and Multi-Level Attention Network.Here go the details for the research.:For occlusion and scale variation,it is necessary to capture temporality in a better way.The existing methods lead to error accumulation and inaccurate matching.This thesis proposes a VOS network with a multi-temporal structure,which includes a long-term network to encode absolute object variations,a short-term network to capture relative object dynamics,and a gate-merged network for fusing long-term and short-term informa-tion.The location and accurate target information can be obtained through the long-term and short-term structures.For the similarity of the target and the background,it is necessary to extract more discriminative target features.The existing methods do not consider global relationship features,temporal and spatial semantics.Traditional convolutional layers cannot adap-tively aggregate target features.This thesis designs a multi-level attention mechanism,which can infer the global relationship information between the current frame and the first frame through the global relation attention network,adapt the target features of the cur-rent frame through a channel-and-space attention network and optimize the gate-merged network by combining temporal and spatial attention to separate the appearance of similar background better.It is difficult for the existing algorithms to meet the high speed in practical appli-cations,especially for multiple objects.This thesis proposes a multi-target segmentation network based on single forward propagation.This thesis tackles multiple targets as a batch to run in one forward pass without post-processing,which avoids repeating multi-ple times for multi-object.In addition,the optical flow used by the short-term network is replaced by a mask prediction network to improve the segmentation speed.Extensive experiments on widely used benchmarks including YouTube-VOS and DAVIS 2017 have demonstrated that the model proposed in this thesis can achieve a competitive accuracy and speed in comparison to several state-of-the-art methods.
Keywords/Search Tags:Video Object Segmentation, Deep Learning, Multi-Temporal Structure, Multi-Attention Mechanism
PDF Full Text Request
Related items