| The task of video object segmentation aims to perform pixel-level classification of video sequences and use inter-frame relationships for accurate object segmentation.This task has a wide range of application value and implementation requirements in the fields of video editing,human-computer interaction,and autonomous driving.With the vigorous development of computer vision,many classical networks and large datasets have been proposed,and the study of video target segmentation has gradually received attention.However,the existing semi-supervised video object segmentation methods are not accurate enough in the segmentation of small objects and the discrimination of similar objects.Moreover,most of the methods are not real-time and difficult to apply in practice.Therefore,this paper conducts a study on the insufficient segmentation accuracy,lack of discriminative performance,and complex models in the existing semi-supervised video object segmentation methods.Firstly,considering the inaccurate segmentation of small objects and object edges in images,this paper proposes a semi-supervised video object segmentation method based on multiple feature enhancement.After feature extraction in the encoding stage,a lightweight multi-scale feature fusion module is introduced to make full use of semantic depth information and shallow detail features to enhance features with minimal computational burden.In the decoding stage,a cross-dimensional interactive attention module is proposed to mine the interactive information and dependencies between different dimensions of features,and strengthen the feature expression ability from another aspect.Then,considering the lack of discriminative ability of similar objects in images,this paper proposes a semi-supervised video object segmentation method based on adaptive guided discriminator.In order to improve the ability of the discriminator network to discriminate similar objects,a multiple guidance module is proposed to optimize the guiding role of the discriminator in subsequent segmentation,thereby enhancing the overall discriminative ability of the network.In addition,an adaptive sample pool update strategy and two update indicators are proposed according to the confidence of prediction results,which reduce the impact of unreliable samples on the discriminative ability,and accelerate the inference of the network to a certain extent.Finally,we evaluate the methods proposed in this paper on DAVIS2016,DAVIS2017 and You Tube-VOS datasets respectively,and evaluate the effectiveness of the two semisupervised video object segmentation algorithms proposed in this paper in terms of segmentation accuracy improvement,discriminative performance improvement and realtime performance improvement. |