Font Size: a A A

Research On Aesthetics-based Image Spatio-temporal Visual Attention

Posted on:2021-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:J C LvFull Text:PDF
GTID:2518306548981709Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rise of the mobile Internet and the emergence of a large number of images,how to extract the value in images has attracted much attention.The visual attention mechanism helps people solve this problem.Previous researches on visual attention corresponded to static and dynamic attention,where spatial saliency map models and temporal scanpath models were proposed,respectively.Nevertheless,previous work on visual attention ignored human sentiment,while human aesthetic sentiment is closely related to visual attention.Therefore,this thesis proposes an aesthetics-based spatiotemporal visual attention prediction algorithm,and proposes the following two methods towards image saliency map prediction and scanpath prediction respectively:1.An aesthetics-based multi-task saliency map prediction method.This thesis introduces aesthetics context for visual attention prediction for the first time.To this end,an encoder-decoder model based on multi-task convolutional neural networks is designed.A color image is input into the shared backbone network encoder,and then passed through the corresponding aesthetics decoder or saliency decoder according to the data source to calculate the loss and update the model parameters by back propagation.In this thesis,three different popular backbones are used as the encoders for parameters sharing of both aesthetics task and saliency map task.Experimental results show that the aesthetics-based multi-task model can generate accurate saliency maps,which confirms the positive effect of aesthetic sentiment on visual attention prediction,and being superior to most existing saliency models.2.A scanpath prediction method based on coarse-to-fine networks.Our aestheticsbased saliency map prediction algorithm provides an effective solution for spatial attention.Therefore,the aesthetics-based saliency features from the spatial attention model is extracted and coarse-to-fine networks are designed to solve the temporal attention problem.The coarse-to-fine networks are composed of convolutional neural networks and LSTM networks.In order to achieve accurate prediction of scanpaths,this thesis proposes Inhibition of Return Attention(IRA)mechanism inspired by the Inhibition of Return(IOR)mechanism and input the IRA sequence to the fine LSTM networks at different time steps.Then the newly scanpath loss is proposed to train the entire model.Experimental results prove that the effectiveness of IRA mechanism and scanpath loss,and are superior to most existing scanpath prediction algorithms.
Keywords/Search Tags:Visual attention, Saliency map prediction, Scanpath, Convolutional neural networks, LSTM networks
PDF Full Text Request
Related items