| Video object segmentation is a task of separating foreground objects and background regions in videos to achieve pixel-level classification.It has a wide range of applications in video coding,pose analysis,autonomous driving and short video entertainment.According to the different annotation forms given in the video during the testing phase,the task can be divided into four categories: unsupervised,semi-supervised,weakly supervised,and interactive.With the continuous development of deep learning technology,video object segmentation has made great progress but many existing algorithms improve segmentation accuracy at the expense of segmentation speed and the segmentation performance is often poor in complex scenes.This paper studies weakly supervised and semi-supervised video object segmentation and proposes a video object segmentation algorithm based on correlation signal-guided learning hierarchical features fusion.It uses the correlation signals of the object contour and texture information generated in the network to assist the segmentation.The segmentation performance can be effectively improved while the inference speed remains basically unchanged.The research results are as follows:In this paper,a weakly supervised video object segmentation algorithm based on correlation features guidance and dynamic search update is proposed.Aiming at the problem that the weakly supervised video object segmentation algorithm does not use correlation features enough in the segmentation process and the segmentation results depend heavily on the tracking results,this algorithm designs a feature fusion segmentation module and a dynamic search update mechanism.First,the feature fusion segmentation module fuses the correlation features of the tracking network and the features of the backbone network through the encoder network and the correlation features are used to strengthen the fusion features to output the segmentation results by the decoder network.Then,through the dynamic search update mechanism,this algorithm uses the balanced state evaluation score to integrate tracking the bounding box and the segmentation contour box and the optimal cropping selection of the search image is made,alleviating the dependence of the segmentation results on the tracking results.Finally,the paper conducts a full evaluation on the DAVIS2016,DAVIS2017 and You Tube-VOS datasets.Compared with the baseline method on the multi-target dataset DAVIS 2017 validation set,the inference speed of the algorithm is basically unchanged and the accuracy is improved by 1.2%,which fully proves the excellent performance of the proposed weakly supervised video object segmentation algorithm.In view of the inaccuracy of the first work on the processing of the target details,this paper introduces the idea of correlation signals guidance into the semi-supervised field.At the same time,this paper improves the network in the first work and proposes a semi-supervised video object segmentation algorithm based on deep and shallow representation fusion.Efficient high-order attention model and fusion segmentation module are designed.First,the efficient high-order attention model effectively integrates the spatial and channel attention models,which can better extract the deep semantic information of the image.Then,the fusion segmentation module uses pixel-level correlation signals to fuse deep semantic features and shallow location features,so that the network can learn more robust features to effectively distinguish similar objects in complex backgrounds,improve the processing of target details and segmentation accuracy.Finally,this paper conducts a full evaluation on the popular video object segmentation datasets DAVIS 2016,DAVIS 2017 and You Tube-VOS.The result shows that on the multi-target dataset DAVIS 2017 validation set,the proposed algorithm is improved by 1.1 percentage points compared with the baseline method and achieves excellent performance in terms of speed and accuracy,which fully proves the superiority of this algorithm. |