Semi-supervised video object segmentation is an application of video processing tasks,aiming to segment moving objects in video sequences,which are specified by mask annotations given in the first frame.Semi-supervised video object segmentation helps computers understand video scenes and provides important technical support for other tasks in the field of computer vision,such as video retrieval and target content replacement.There are currently two challenges for this algorithm: first,it is difficult to find an ideal balance between segmentation accuracy and speed.High-precision methods require multiple modules or complex architectures to improve accuracy,but this also reduces algorithm speed;second,the current algorithm has issues with spatial robustness,meaning it may struggle to handle complex scenes,such as occlusions,rapid motion,or multiple similar objects.To address these challenges,this study aims to develop an efficient semi-supervised video object segmentation algorithm.The main contributions of this research are:(1)Global Attention Algorithm: To address the difficulty of finding a balance between segmentation accuracy and speed,this paper proposes a global attention algorithm to achieve high-precision real-time segmentation.The algorithm improves segmentation accuracy by incorporating global attention information during the segmentation process.This method encodes image frames and target masks separately,with the mask encoder having significantly lower computational requirements than the image frame encoder due to the simplicity of mask information.The mask encoder can use smaller depth and width,which helps save substantial computation time and resources.While maintaining segmentation accuracy,the proposed method can process video frames faster than existing methods.Moreover,the global attention algorithm can effectively integrate multi-frame image information during inference computation without increasing memory consumption.This is attributed to the global attention module’s ability to adaptively adjust the information it uses to obtain the best segmentation results.Therefore,this algorithm has higher practical value and applicability in real-world scenarios while ensuring segmentation accuracy and meeting real-time computing requirements.(2)Spatial Constraint Algorithm: To resolve potential mismatch issues in video segmentation,this paper introduces a spatial constraint algorithm to assist the segmentation algorithm.The spatial constraint algorithm utilizes mask information from previous frames to constrain the segmentation scope of the current frame,which helps alleviate mismatch issues in the current frame.This method has several advantages,one of which is that it requires almost no additional resource overhead.With the aid of the spatial constraint algorithm,matching algorithms can be effectively optimized,enhancing their accuracy and robustness,making them more suitable for complex visual scenes.In addition,the spatial constraint algorithm also has potential applications in other target detection and image segmentation domains. |