| Salient object detection aims to identify the most salient objects and regions in an image,and has been successfully used as a preprocessing process for computer vision tasks such as object tracking,object recognition,and semantic segmentation.Traditional methods mostly rely on hand-crafted features such as color and texture or heuristic priors to capture the local details and global context of the image,which is limited by the expressiveness of features and greatly reduces the ability to detect salient objects in complex scenes.In recent years,the convolutional neural network has developed rapidly.Thanks to the massive data and the powerful feature expression ability of the model,the performance of deep learning-based algorithms has been greatly improved.Using the technology of deep convolutional neural network,this paper firstly explores the salient object detection of single-frame images,and then extends to the research of video salient object detection algorithms.The main work is as follows.Aiming at the problems of the spatial location and boundary details are ignored in existing salient object detection algorithms,a network that uses semantic information to guide feature aggregation is proposed to obtain fine saliency maps through efficient feature aggregation.The network mainly including three modules: Mixing Attention Module(MAM),Enlarged Receptive Field Module(ERFM)and Multi-Level Aggregation Module(MLAM).First,the lowlevel features extracted by the feature extraction network are processed by the Enlarged Receptive Field Module,so that the receptive field is enlarged while retaining the original edge details,so as to obtain richer spatial context information.Then,the last layer features of the feature extraction network are processed by Mixing Attention Module to enhance its representational power and serve as a semantic guide in the decoding process to continuously guide feature aggregation.Finally,the Multi-Level Aggregation Module effectively aggregates features from different layers to obtain the final refined saliency map.This chapter conducts extensive experiments on 6 benchmark datasets,and the results have proved that the method can effectively locate salient features,and is also effective for the refinement of edge details.Building on the first work,this paper extends it to video saliency object detection.Video salient object detection needs to detect spatiotemporally salient regions in videos,so capturing inter-frame dynamic information is crucial for this task.In this paper,a motion guidance for video salient object detection algorithm is proposed,which uses the motion information in the optical flow graph to assist the spatial branch to achieve accurate capture of moving salient objects.The network is designed as a dual-stream structure,inputting the optical flow map and the original image respectively.First,the features extracted from the two branches are aggregated by motion to generate aggregated features that are both spatiotemporally significant,and then the aggregated features need to be significantly enhanced,using the original image features to achieve spatial saliency enhancement,using optical flow features to achieve temporal saliency enhancement,the enhanced spatiotemporal features have stronger spatiotemporal saliency expression capabilities,and finally the spatiotemporal features are sent to the decoder to obtain the final forecast graph.This chapter conducts sufficient experiments on 4 video benchmark datasets,in which the MAE index of DAVSOD reaches 0.07.The experimental results prove that the method in this chapter is competitive and can well capture the salient features in the video. |