Research On Video Multi-object Segmentation Algorithm Based On Multi-temporal And Multi-level Attention Network

Posted on:2022-01-08

Degree:Master

Type:Thesis

Country:China

Candidate:J J Wang

Full Text:PDF

GTID:2518306563977469

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Video object segmentation(VOS)is a popular research area in the computer vision community with various applications including dense object tracking in video understand-ing,video editing,video summarization,autonomous driving,etc.However,there are still many difficulties and challenges in practical applications.The first difficulty is the tar-get changes with the scene lead to frequent occlusions and scale variation.Then there is a great degree of similarity in the appearance between the target and the background.Finally,it is difficult for the existing algorithms to achieve high speed in practical appli-cations,especially for multiple objects.To overcome this limitation,this thesis develops an efficient and fully end-to-end model to achieve fast and accurate VOS,named Multi-Object Segmentation via Multi-Temporal and Multi-Level Attention Network.Here go the details for the research.:For occlusion and scale variation,it is necessary to capture temporality in a better way.The existing methods lead to error accumulation and inaccurate matching.This thesis proposes a VOS network with a multi-temporal structure,which includes a long-term network to encode absolute object variations,a short-term network to capture relative object dynamics,and a gate-merged network for fusing long-term and short-term informa-tion.The location and accurate target information can be obtained through the long-term and short-term structures.For the similarity of the target and the background,it is necessary to extract more discriminative target features.The existing methods do not consider global relationship features,temporal and spatial semantics.Traditional convolutional layers cannot adap-tively aggregate target features.This thesis designs a multi-level attention mechanism,which can infer the global relationship information between the current frame and the first frame through the global relation attention network,adapt the target features of the cur-rent frame through a channel-and-space attention network and optimize the gate-merged network by combining temporal and spatial attention to separate the appearance of similar background better.It is difficult for the existing algorithms to meet the high speed in practical appli-cations,especially for multiple objects.This thesis proposes a multi-target segmentation network based on single forward propagation.This thesis tackles multiple targets as a batch to run in one forward pass without post-processing,which avoids repeating multi-ple times for multi-object.In addition,the optical flow used by the short-term network is replaced by a mask prediction network to improve the segmentation speed.Extensive experiments on widely used benchmarks including YouTube-VOS and DAVIS 2017 have demonstrated that the model proposed in this thesis can achieve a competitive accuracy and speed in comparison to several state-of-the-art methods.

Keywords/Search Tags:

Video Object Segmentation, Deep Learning, Multi-Temporal Structure, Multi-Attention Mechanism

PDF Full Text Request

Related items

1	Temporal Information And Multi-Scale Fusion Based Video Object Detection
2	A Research Of Video Question Answering Based On Deep Learning
3	Multi-Person Pose Estimation Based On Deep Learning
4	Research On Key Technologies Of Video Group Activity Analysis And Recognition
5	Research On Unsupervised Video Multi-object Segmentation Algorithm
6	Research Of Spatiao-temporal Attention Mechanism For Weakly Supervised Object Detection And Segmentation
7	Research On Video Object Detection Based On Multi-Attention Mechanism
8	Research On Interactive Video Object Segmentation Based On Deep Learning
9	Research On Video Object Segmentation Algorithm Based On Learning Attention Modulation Network
10	An Improved Automatic Math Problem Solver Based On Temporal Convolutional Networks And Multi-head Attention