| Under the environment of intelligent media and intelligent marketing,computational advertising has become an independent branch and interdisciplinary discipline.From the early focus on creative content to attract the audience’s taste,to the use of various cognitive psychology theories,neurobiology theories and media technologies to stimulate the user’s senses,and to the use of big data computing to capture the user’s needs for personalized recommendations,the enterprise’s pursuit of efficiency and the user’s exploration of individual lifestyles are more perfectly integrated.While the big data era brings the dividend of information explosion,it also brings a large amount of information redundancy,which brings new challenges to accurately estimate consumer psychology and predict consumer behavior decisions.In terms of advertising characteristics,the advertising contents are mostly artificial scenes,incorporating a large number of strategies and tricks of cognitive psychology.Compared with natural images,advertisements have richer visual elements,more condensed textual semantics,more metaphorical scenes,more diverse styles,and often show the commercial attributes such as persuasive,purposeful and inducing.However,how these characteristics affect users’ visual attention,psychological motivation,affective bias,and behavioral decisions is difficult to assess.Related studies focus on the field of economic management based on the comparative statistical verification methodology.Most conclusions obtained are empirical and difficult to quantify.There is also a lack of publicly available advertising data sets.Under the background of the rapid development of inter-discipline and computer vision,this thesis puts forward a new thinking and solution,that is,carrying out the visual traceability analysis of advertising stimulus from the perspective of user’s visual perception and emotion analysis.Along with the overall research routing from“how to see” to “how to think” about advertisement,this thesis aims to explore users’ emotional attitude towards advertisements from the eye movement information,providing the biological theory support and the computational paradigm for advertising practices such as preference prediction and personalized recommendation.The main work and contributions of this thesis are as follows:(1)To explore the relationship between advertising vision and user’s emotion perception,this thesis first builds an advertising image dataset based on eye tracking.The eye movement data of 57 subjects with different personality traits are collected by a high-precision eye tracker when they are observing 1000 advertising images in the free state,as well as their subjective cognitive data on various affective dimensions such as advertising preference,emotional attributes,aesthetic feeling,brand liking and so on.In addition,we carry out the correlation analyses between visual attention,advertising affective attributes,eye movement characteristics and personality traits of the subjects,and a series of enlightening conclusions are obtained.This dataset fills the gap in the research community that lacks publicly available multimedia datasets with personalized affection labels and visual attention benchmarks.Also,this dataset provides a reliable data basis for visual attention computation and visual sentiment analysis tasks in unnatural scenes,and also provides a new benchmark for theoretical assistance and experimental comparison of visual tasks in natural scenes.(2)To obtain the distribution of users’ visual attention during viewing advertisements,state of the art saliency prediction models are tested on the proposed advertisement dataset ADD1000.It is found that the prediction accuracy of these methods is generally not high,and the high performance models are accompanied by slow inference speed,high computational complexity or large model size,which are inconvenient to be used in practical applications.Therefore,a fast saliency prediction model based on multi-channel activation optimization is proposed.In the two-branch siamese network architecture,two lightweight backbone networks with the same structure are respectively used to learn the global and local saliency features of the image,which accelerates the inference speed of the model.A multi-channel activation optimization module is designed based on three manual features,which increased the interpretability of the model and optimized the saliency expression of the details.The parameter sharing strategy between siamese networks further reduces the model size.Experimental results on multiple saliency datasets show that the proposed model takes into account all the prediction performance,inference speed,computational complexity and model size.Hence,our model has strong practicability.(3)Different from natural images with random contents,advertising images are mainly composed of text elements and picture elements.However,the state of the art saliency models designed for natural scene cannot accurately capture the difference in visual attraction of these two heterogeneous elements,especially showing the defect of insufficient saliency prediction in textual regions.Therefore,this thesis proposes an advertising saliency prediction model based on text enhancement learning.The model includes three core modules: general saliency prior feature extraction module,textural saliency enhancement learning module and feature fusion learning module.Specifically,an advanced optical character recognition algorithm is used to identify the advertising text region,and the pure text map is generated by the expansion and corrosion operations.The text map is put into a lightweight backbone for textural saliency feature learning,and then fused with the saliency prior features extracted from the state of the art saliency model.Experimental results show that the proposed model outperforms the current mainstream saliency prediction methods in our dataset.And the proposed model also can be used as an optimization framework to generally enhance the prediction ability of the general saliency models for advertising images.(4)To explore the advertising emotional cognition cues concealed in human vision,the graph neural network is initially introduced to model the eye movement features,and an advertising preference prediction model based on personalized eye-moving graph inference is proposed.Under the visual attention selection mechanism,human eyes only pay attention on the region of interest,that is,the visual saliency area,rather than the entire visual field of the advertisement.Therefore,the personalized visual saliency is introduced to match the advertisement content to realize the visual sentiment analysis from “how to see” to “what to see” and then to“how to think”.There are three steps: Firstly,the eye movement data are represented by graph embedding,and then the graph neural network(GNN)module is designed to learn the advertising preference feature representation of eye-moving topology.Secondly,the ad image is mapped to the high-dimensional feature space through the standard convolution,and then the pixels are represented as nodes to realize the graph embedding representation.And then the graph is fed into the GNN module to learn the advertising preference feature representation of advertising global semantics.Thirdly,the saliency feature and the global content feature of the advertisement are fused in the high-dimensional space,and then the disjoint regions in the feature map are aggregated into new nodes to realize the graph embedding representation.And the graph is input into the GNN module to learn the advertising preference feature representation related to the personalized saliency contents.Finally,the three feature representations are mapped to the current observer’s advertising preference score through the global average pooling and fully connected layer.To the best of our knowledge,this is the first computable model for predicting advertising emotional cognition based on eye-moving inference. |