Research On Visual Saliency Prediction Method Based On Deep Learning

Posted on:2024-07-08

Degree:Master

Type:Thesis

Country:China

Candidate:F Z Yang

Full Text:PDF

GTID:2568307118980299

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Visual saliency prediction simulates the mechanism of human vision,searching and capturing the location of human eye attention in the image.As a preprocessing method for various visual tasks,visual saliency prediction has been widely applied in the fields of image segmentation,visual tracking,image compression and image retrieval.Convolutional neural network-based visual saliency prediction models cannot make full use of long-range contextual information due to the lack of global understanding of images by convolutional operations.In addition,most saliency prediction models often obtain better prediction performance at the cost of higher computational cost and parameter quantity,which is not conducive to the application of realistic scenarios.This thesis studies the prediction of visual saliency,and the main work is summarized as follows:(1)Aiming at the lack of long-range contextual information encoding ability of the method,Siamese Transformer saliency prediction model(ME-CAS)based on multiprior enhancement and cross-modal attention collaboration is proposed.Design a Transformer-based Siamese network architecture as the backbone network for feature extraction.One of the Siamese Transformer branches captures the remote context information of the image under the self-attention mechanism to obtain the global saliency map.At the same time,a multi-prior knowledge module is constructed to learn the human visual center bias prior,contrast prior and frequency prior,which is input to another siamese branch to learn the detailed features of the underlying visual features and obtain the local information saliency map.Finally,an attention calibration module guides global and local information to be learned collaboratively across modalities to generate the final saliency map.Moreover,the introduction of multiple prior learning modules enhances the interpretability of the model.Extensive experimental results demonstrate that the ME-CAS model achieves superior results on public benchmarks and competitions of saliency prediction models.(2)Aiming at practical application problems of model complexity,slow reasoning speed,and large number of parameters,a lightweight visual saliency prediction model(IBMP-SP)based on inductive bias guidance and multi-way progressive fusion is proposed.Use a lightweight network model instead of the traditional convolutional network as the backbone network to reduce the model’s trainable parameters and shorten the inference time;design a lightweight Transformer module based on inductive bias guidance to improve the lightweight backbone with fewer parameters The feature learning ability of the network;finally,the multi-channel progressive feature fusion module is designed using deep convolution and point-by-point convolution to ensure the accuracy and operating efficiency of the algorithm.The quantitative and qualitative comparison results on multiple data sets show that the model still has more competitive prediction performance under the premise of less algorithm parameters and less computation.

Keywords/Search Tags:

visual significance prediction, convolutional neural network, Transformer, multiple prior knowledge, lightweight

PDF Full Text Request

Related items

1	Research On Key Technologies Of Visual Transformer Network For Robot Environment Perception And Understanding
2	Research On Optimization Method Of Convolutional Neural Network Oriented To Prior Knowledge
3	Research On Lightweight Image Classification Technology Based On Convolutional Neural Network
4	Research On Face Recognition Network With Lightweight CNN-Transformer
5	Research On Fast Target Recognition Of Convolution Neural Network Based On Prior Knowledge
6	Research On Street Scene Semantic Segmentation Based On Lightweight Neural Networ
7	Multiple Domain Knowledge Based Deep Convolutional Neural Networks For Action Recognition
8	Research On Convolutional Neural Network Compression Technology Based On Knowledge Distillation
9	Research And Application Of Gesture Recognition Algorithm Based On Deep Learning
10	Research On Prior Knowledge Based Image Segmentation Technology