Font Size: a A A

Research On Semi-supervised Video Object Segmentation Via Pyramid Network Modulation

Posted on:2021-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:S H JiangFull Text:PDF
GTID:2428330647452391Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
Video object segmentation aims to extract interesting objects from complex video scenes and segment them quickly and accurately.However,in the actual environment,video object segmentation still faces difficulties caused by many external interference factors,especially when multiple similar targets coexist,video object segmentation may be more challenging.Therefore,in order to deal with the problem of single object and multi-object segmentation in complex video scenes,this paper proposes a semi-supervised video object segmentation algorithm based on pyramid network modulation.The main research work is as follows:Aiming at complex video scenes such as target scale change and color unevenness,a semi-supervised video single object segmentation algorithm based on pyramid pooling network modulation was proposed.First,a one-way transmission of the modulation network was used to make the segmentation model adapt to the appearance characteristics of a given object,which means,a modulator was learned based on the visual and spatial information of target object to modulate the intermediate layers of segmentation network to make the network adapt to the appearance changes and displacement information of specific object.Secondly,global context information was aggregated in the last layer of the segmentation network through the multi-region context fusion method.Finally,feature information of the high and low layers of segmentation network was directly integrated to make up for the lack of target details in the last layer of segmentation network.The proposed semi-supervised video object segmentation method is a network which is able to be trained end-to-end.Extensive experimental results show that the performance of the proposed method on the DAVIS 2016 and DAVIS 2017 datasets can achieve competitive results compared with the more advanced methods using online fine-tuning and run on a single GPU at a speed of 0.14 s per frame.Aiming at the problem that the segmentation results of multiple objects in video are not obvious,this paper further proposes a semi-supervised video multi-object segmentation algorithm based on dual pyramid network modulation.In this paper,the idea of gradual fusion of high-level and low-level feature information was added after the last layer of segmentation network.Specifically,high-level semantic feature maps were constructed at all scales through horizontally connected left-to-right structures to fully integrate target location and detail information in low-level features and strong semantic information in high-level features to achieve the purpose of improving the segmentation results.Experiments show that the proposed method effectively improves the segmentation accuracy of multiple objects in the above method,and the segmentation accuracy on the DAVIS 2016 and DAVIS 2017 datasets has increased by 0.9 percentage points and 2 percentage points,respectively.In addition,this paper also studies the real-time problem of the algorithm.By adding large-scale data training and using a lightweight network,the segmentation model can run at a speed of 0.06 s per frame on a single GPU,which increases the practicality of the segmentation algorithm.
Keywords/Search Tags:Video object segmentation, Pyramid pooling model, Multi-scale fusion, Full convolutional network, Deep learning
PDF Full Text Request
Related items