Font Size: a A A

Research On Visual Saliency Prediction Method Based On Deep Learning

Posted on:2024-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:F Z YangFull Text:PDF
GTID:2568307118980299Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Visual saliency prediction simulates the mechanism of human vision,searching and capturing the location of human eye attention in the image.As a preprocessing method for various visual tasks,visual saliency prediction has been widely applied in the fields of image segmentation,visual tracking,image compression and image retrieval.Convolutional neural network-based visual saliency prediction models cannot make full use of long-range contextual information due to the lack of global understanding of images by convolutional operations.In addition,most saliency prediction models often obtain better prediction performance at the cost of higher computational cost and parameter quantity,which is not conducive to the application of realistic scenarios.This thesis studies the prediction of visual saliency,and the main work is summarized as follows:(1)Aiming at the lack of long-range contextual information encoding ability of the method,Siamese Transformer saliency prediction model(ME-CAS)based on multiprior enhancement and cross-modal attention collaboration is proposed.Design a Transformer-based Siamese network architecture as the backbone network for feature extraction.One of the Siamese Transformer branches captures the remote context information of the image under the self-attention mechanism to obtain the global saliency map.At the same time,a multi-prior knowledge module is constructed to learn the human visual center bias prior,contrast prior and frequency prior,which is input to another siamese branch to learn the detailed features of the underlying visual features and obtain the local information saliency map.Finally,an attention calibration module guides global and local information to be learned collaboratively across modalities to generate the final saliency map.Moreover,the introduction of multiple prior learning modules enhances the interpretability of the model.Extensive experimental results demonstrate that the ME-CAS model achieves superior results on public benchmarks and competitions of saliency prediction models.(2)Aiming at practical application problems of model complexity,slow reasoning speed,and large number of parameters,a lightweight visual saliency prediction model(IBMP-SP)based on inductive bias guidance and multi-way progressive fusion is proposed.Use a lightweight network model instead of the traditional convolutional network as the backbone network to reduce the model’s trainable parameters and shorten the inference time;design a lightweight Transformer module based on inductive bias guidance to improve the lightweight backbone with fewer parameters The feature learning ability of the network;finally,the multi-channel progressive feature fusion module is designed using deep convolution and point-by-point convolution to ensure the accuracy and operating efficiency of the algorithm.The quantitative and qualitative comparison results on multiple data sets show that the model still has more competitive prediction performance under the premise of less algorithm parameters and less computation.
Keywords/Search Tags:visual significance prediction, convolutional neural network, Transformer, multiple prior knowledge, lightweight
PDF Full Text Request
Related items