The semi-supervised video object segmentation task has become a research hotspot of computer vision under the trend of widespread generation of video data,it can help researchers to automatically segment the objects of interest in the video sequence and improve the efficiency of subsequent tasks,and it has been widely used in fields such as autonomous driving,video editing,and video surveillance.However,due to the complexity of video data,semi-supervised video object segmentation still faces challenges such as occlusion,object appearance changes,and similar objects.The existing methods can’t handle multiple difficult scenarios robustly.For this reason,this thesis proposes different solutions from two perspectives,and verify the proposed method through a large number of experiments.The main research work and innovations of this thesis are as follows:1.Aiming at the problem that the propagation-based method spreads the object smoothly but can’t adapt to scenes such as occlusion and fast movement,while the matching-based method can match the object appearance similarity at different times,but it is difficult to adapt to the object appearance’s change,a two-branch model combining propagation and matching is proposed.Among them,the reference-guided propagation module makes full use of the object guidance to propagate by combining the feature information of the first frame and the previous frame,and a dynamic update method is proposed in the global matching module to store the information of all past frames as much as possible,so as to better adapt to changes in object appearance.The experimental results on the public dataset show that the model surpasses most existing methods in terms of accuracy and efficiency,and get better results.2.Aiming at the problem that the existing memory-based method will increase the cost linearly with time when processing long video data,a cascaded semi-supervised video object segmentation method based on adaptive memory module is proposed.This method uses an adaptive memory module to avoid linear growth,which is suitable for long video data.In order to improve the efficiency and accuracy of the model,first,the cascaded network from tracking to segmentation is used to reduce the size of image resolution and improve the efficiency of the subsequent segmentation model,then add a boundary prediction branch to assist the model to improve the accuracy of the segmentation results.This method also shows that it not only improves the accuracy on the basis of the memory mechanism method,but also has higher efficiency through the experimental results on the public dataset,and it won’t increase the computational consumption over time. |