| Video object segmentation aims to extract objects of interest from complex video scenes and segment them quickly and accurately.However,in practical environments,video object segmentation still faces difficulties caused by many external interference factors,especially when multiple similar objects coexist,video object segmentation may be more challenging.Therefore,in order to deal with single-target and multitarget segmentation problems in complex video scenes,this paper proposes an unsupervised video single-target segmentation algorithm based on location information fusion and a video multi-target segmentation algorithm based on target location information guidance.The main research works are as follows:Video sequences have the feature of strong inter-frame information correlation.The traditional image segmentation method is to perform frame-by-frame analysis,which makes insufficient use of the inter-frame relationship,resulting in a huge amount of calculation.In this paper,the optical flow information of the inter-frame pixel motion relationship is introduced into the video segmentation task,and a dualstream unsupervised video object segmentation network is proposed,which focuses on the motion changes of key parts in the video sequence and uses optical flow to realize the information transmission between frames,reducing the redundant computation in segmentation networks.Optical flow detail information formed by the optical flow network in the dualbranch network will lose seriously after the feature extraction.To solve this problem,the appearance features of the video frame are extracted to compensate for the detailed information of the main targets in the optical flow image.We use selfattention mechanism and mutual attention mechanism to realize the information extraction and fusion of the dual-branch network.In order to improve the segmentation accuracy,we adopt a spatial-channel attention module with contextual information fusion,which further refines the features and makes the segmentation network more focused on specific video main objects.Comparative experiments with mainstream unsupervised video target segmentation algorithms such as PDB,Mot Adapt,Ep O+,An Diff and DFNet on the DAVIS 2016 dataset validate the superior segmentation accuracy of the proposed algorithm.Finally,this paper proposes conditional normalisation for video multi-target segmentation,which reduces the problem of multi-target information loss caused by the use of conventional normalisation in neural networks,and improves the recognition rate of multiple targets.Experiments show that the addition of the multitarget information guidance module can improve the recognition and segmentation effect of the backbone network for multiple targets,and can enhance the adaptation capability of the segmentation network for multi-target segmentation. |