Font Size: a A A

Video Object Segmentation Based On Spatiotemporal Information Fusion And Attention Mechanism

Posted on:2022-08-26Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2518306563462914Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the vigorous development of video surveillance,short video social networking and other fields,massive amounts of video data have been generated.The development of video object segmentation algorithms has promoted the widespread application of video content understanding technology,which is of great significance for tasks such as intelligent security,video storage,and autonomous driving.Video object segmentation aims to separate some specific and salient objects from the video background,which is essentially a pixel-level classification task.However,the diversity and complexity of video content causes a large number of problems such as foreground and background confusion and object occlusion in the video,which brings huge challenges to the video target segmentation algorithm.In this paper,along the technical route from single-target video segmentation under unsupervised settings to multi-target video segmentation under semi-supervised settings,the following research work has been done:(1)A video single-target segmentation algorithm based on appearance and motion metrics and region constraints is constructed under unsupervised settings.In view of the large deformation of the target in the video or the inaccurate segmentation caused by the target movement too fast,the appearance and motion measurement module is constructed.Use Spacetime EMAU to model the appearance of the target,and use ConvLSTM Model the movement of the target.The introduction of the appearance and motion measurement module enables the video target segmentation algorithm to better adapt to the motion changes of different targets in the video,and is robust to large changes of different magnitudes.Because the unsupervised setting does not provide any prior information of the target to be segmented,the video target segmentation algorithm is difficult to handle scenes with complex backgrounds.For this reason,we designed a region constraint loss function,that is,by adding a Centerness loss function in the training process of the model,the foreground target region to be segmented is constrained.Finally,this paper verifies the effectiveness of the algorithm through experiments on the DAVIS 2016 dataset,and the segmentation accuracy reaches 68.7%,reaching a high level.(2)In the semi-supervised setting,a video multi-target segmentation algorithm based on similarity measurement and self-attention is constructed.Semi-supervised refers to the precise segmentation of the target in the subsequent frames of the video based on only the mask of the target to be segmented is provided in the first frame of the video.This article refers to the algorithm framework of video object segmentation(TVOS)based on transduction,and embeds the self-attention mechanism unit,which ensures the efficiency and real-time performance of the video object segmentation algorithm,and improves the accuracy of the segmentation.Finally,this paper verifies the embedding of the self-attention mechanism through experiments on the DAVIS2017 dataset,enriches the detailed information of the target to be segmented,and optimizes the contour information of the target.The segmentation accuracy has also reached 72.7%,and the speed has reached 17 FPS.Which fully illustrates the effectiveness of the algorithm.
Keywords/Search Tags:Video Object Segmentation, ConvLSTM, Region constraint, Self-attention mechanism, Similarity measurement
PDF Full Text Request
Related items