Font Size: a A A

Research On Salient Object Detection In Light Field Based On Collaborative Attention Mechanis

Posted on:2024-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiFull Text:PDF
GTID:2568306920475044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Salient object detection is a classic research topic in the field of computer vision,aiming to simulate human visual perception systems and locate the most attractive targets in scenes.It has been widely applied in various computer vision tasks and has received increasing attention from researchers in the past decade.Since light field data can record depth information in natural scenes that are conducive to salient object detection through multiple perspectives or focal lengths,using light field as input to neural networks to improve the performance of traditional saliency detection algorithms based on RGB images has become a new research approach.With the vigorous development of deep learning,more and more deep models have been proposed to improve traditional saliency detection algorithms.However,most researches in this field still have two main limitations.Firstly,most existing models are constructed by a dual-stream network that select focal stacks and all-in-focus images as inputs,but they rarely focus on inter-slice modeling in the encoding stage of the focal stacks’ branch.Among them,the focal stack is a group of slices with different focal lengths,which tends to focus on different local details,while all-in-focus image can be generated from the focal stack through digital montage technology,placing more emphasis on depicting spatial details in the scene.Secondly,existing models rarely explore the complementarity between focal stacks and all-in-focus images.Due to the characteristics of light field and considering the fact that significant regions typically occur at a specific focal length in a given scene,more attention should be paid to the possible effective information contained in multi-modal data.Finally,in the decoding stage,many methods often apply a single attention mechanism to weight key feature channels for feature fusion between multi-modal data.A new cross-modal fusion strategy with more complex attention mechanisms should be proposed to learn powerful cross-modal complementarity.In this paper,an asymmetric dual branch backbone network is proposed to detect salient targets in the light field.This dual branch network consists of 2D CNN and 3D CNN,using different deep convolutional neural networks to encode the information in the focal stacks and all-in-focus images,and then effectively fuse adjacent high-level features through the combination of residual block and channel attention mechanism.And then,the selected multi-modal features are fused and the final predicted saliency map is obtained in a progressive fusion method.Secondly,to obtain rich spatial feature information in all-in-focus images,the Swin Transformer framework was further explored on the basis of the original model to optimize and improve the fully focused branches.The improved branch achieves more accurate and significant results.Finally,considering the complementary features between the focal stack and the all-in-focus image,this study continues to explore the use of the same Swin Transformer block for feature fusion of multi-modal data in different stages,better utilizing the effective information between multi-modal data to further improve the performance of the detection model.Experiments have shown that the asymmetric dual branch saliency target network proposed in this paper can outperform well on three widely used light field datasets(LFSD,HFUT and DUT_LF)in four evaluation metrics,including the most advanced saliency detection models based on traditional methods and deep learning methods.Meanwhile,ablation experiments have shown that the improved all-in-focus branch and effective multi-stage feature fusion module based on the original model further validate the effectiveness and superiority of the proposed model in this study.
Keywords/Search Tags:Salient object detection, light field, attentional mechanism, Transformer
PDF Full Text Request
Related items