| Faced with the huge amount of image information,how to obtain useful information efficiently has become a research hotspot.Human visual system can quickly and accurately locate and recognize objects of interest from complex visual scenes by applying visual attention mechanism.If the computer has the ability of intelligent selection and calculation of human visual system,it will greatly improve the speed and accuracy of information processing.Inspired by this,human visual attention mechanism has been simulated in the process of image information processing.Saliency detection,as the basic problem of visual attention mechanism,has been widely studied.Image saliency detection plays a very important role in the field of computer vision.The results can be effectively used in image compression,image retrieval and image segmentation tasks.In the existing saliency detection models,when salient objects appear in low contrast background and have confused visual appearance,the traditional models can not produce good enough prediction results.FCN-based models mainly devoted to obtaining salient information through linear combination of high-level features extracted from the last several convolution layers.Such salient information lacks low-level visual information,and the accuracy of target boundary prediction results is poor.In addition,the FCN network needs up-sampling operation to get the same size output as the input image.If the high-level features are directly up-sampled,the prediction results will be sparse and irregular.To solve the above problems,in this thesis,a salient region detection model based on full convolution and encoder-decoder is constructed,and saliency object instance segmentation is achieved based on this model.The main contents of this thesis are as follows:(1)Aiming at the problem of inaccurate target boundary location,sparsity and irregularity of salient map in existing models,the full convolution and encoder-decoder based salient region detection model is constructed.The process of model construction includes four parts:encoding process,decoding process,multi-layer feature connection operation and saliency map fusion.The encoding process retains spatial information,decoding process optimizes the sparsity of saliency map,multi-layer feature connection operation optimizes the target boundary and irregularity of saliency map,and saliency map fusion operation makes the final saliency map closer to the ground truth.We conduct extensive experiments on five public datasets,and compared with eight state-of-art models on precision-recall curves,F measures and mean absolute error.Experiments show that our model outperforms other models.(2)Based on the full convolution and encoder-decoder based salient region detection model,saliency object instance segmentation is achieved.Firstly,aiming at the problem of ambiguous boundary of salient maps,the multi-level segmentation method is used to optimize it.Then,the object proposals are obtained by SSD model and optimized to obtain the number of salient target instances.Finally,the saliency object instances are segmented by fully connected conditional random field combined with salient maps and saliency object proposals.Experiments show that the proposed method achieves the segmentation of salient target instances well,and the better the salient map performance,the better the segmentation results,and the performance on mean pixel accuracy and mean intersection over union is better than other methods. |