| Image segmentation is the process of partitioning a digital image into several disjoint regions with different features such as color,texture,and shape.These features are consistent in internal regions and diverse between disjoint regions.Image segmentation is a basic task of computer vision.It can be divided into automatic image segmentation and interactive image segmentation according to whether the user participates.The segmentation results of automatic image segmentation may not meet the needs of users in some specific application scenarios.However,with simple interactions(such as click,scribble line,bounding box,etc.),a user can intervene and control the segmentation process to obtain segmentation results that meet user intention.Therefore,in specific application scenario,interactive image segmentation has advantages that automatic image segmentation does not have.Interactive segmentation algorithms include traditional graph theory-based algorithms and deep learning-based algorithms.Traditional graph theory-based algorithms build energy functions with hand-crafted features.Among the commonly used interactions,scribble line is simple to implement and can provide more prior information,and is widely applied in traditional interactive image segmentation algorithms.In recent years,many deep-learning-based semantic segmentation algorithms have sprung up.These algorithms provide a better foundation for deep learning-based interactive image segmentation.The simplest mouse click is often used as a way of interaction in deep interactive image segmentation.However,the existing algorithms only take interaction as a position limitation of the algorithm or the network,which does not fully explore the perceptual ability of prior information.Therefore,we focuse on semantic perception of user intent.We first enhance the ability of prior information to perceive regions and boundaries by mapping user interactions to multi-scale handcrafted features.In addition,in the deep interactive image segmentation algorithm,we optimize the click encoding and fusion strategy to improve the semantic perception of user intent.The main work of this paper is summarized as follows:(1)We propose an interactive image segmentation algorithm based on the adaptive fusion with diffusion of multi-scale spatial information.The interactive information is mapped into multi-scale structured features,which makes user interactive information have the ability of multi-scale semantic perception.First,multi-scale superpixel layers are generated by controlling the number of superpixels.Second,by clustering pixel-superpixel elements with interactive information,the similarity matrices and label priors with pixel-superpixel levels are obtained.Then,a fusion with diffusion strategy is designed to build the energy function by combining these multi-cues.Finally,the influence coefficient of each scale and the labeling are updated with each other until convergence.The experimental results show that the proposed algorithm has a better capacity to repress the noise of segmentation results,and makes the segmentation results more delicate.(2)By introducing deep features into interactive image segmentation algorithms,we propose a method by fusing with multi-scale annotation information.First,by setting different Gaussian radius,two groups of Gaussian maps with different scales are calculated for each click,which are respectively used to indicate global selection and local fune-tuning.Secondly,by fusing with the small scale Gaussian maps,and some down-sampling modules in the basic segmentation network are removed,hence richer detailed features of targets are extracted.Finally,in order to maintain the integrity of the target segmentation results,a non-local feature attention module is proposed and this module fuses large-scale Gaussian maps.Experimental results suggest that the proposed algorithm can not only maintain the integrity of the segmentation but also obtain the segmentation results of the target details,which greatly promotes the efficiency of interaction.(3)We propose a deep interactive image segmentation algorithm based on user intention perception and shape information constraints.First,we develop an adaptive Gaussian map with distinct variances to encode user annotations,which promotes sensitivity to details by adaptively adjusting the affected region of each click.Second,we integrate the adaptive Gaussian map into a dual-stream network,which has two branches: a fully convolutional neural network that extracts the object semantic features and an interactive shape stream that handles the shape information,thereby suppressing the boundary information outside the target in interest.Experimental results show that the adaptive Gaussian maps excel at fine-tuning the details,and clear target boundaries can be obtained with the interactive shape stream,which significantly reduces the interaction burden of the user.(4)We re-examine the click embedding in deep interactive image segmentation and propose a feature interactive map to build a tight relationship between user’s interaction and semantic information of target in interest.Furthermore,we develop an interactive non-local block to capture long-range dependencies of the feature interactive map.Meanwhile,we exploit an early-late fusion strategy to fuse the feature of the interactive non-local block and features of the basic segmentation network,hence amplifying the influence of the spatial prior and semantic information on final segmentation.Experimental results indicate the segmentation performance of the proposed algorithm is robust to the target with varying scales. |