Font Size: a A A

Visual Attention

Posted on:2011-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:P BianFull Text:PDF
GTID:2208360305997479Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
In the human visual system, visual information is first decomposed into basic features such as intensity, color, orientation, motion, etc. While this is quickly conducted for the entire visual input in a massively parallel network, our perception of the visual scene is only directed to a limited area at any given time, and serially shifts as we change our focus of attention. So what is the purpose of having such a large feature set of the entire visual input if we can only perceive a small area? It is used to locate generally salient areas that will be the location of our next focus of attention, and is crucial to reduce search time for detection and recognition tasks.In this paper we study this very important visual attention mechanism, specifically frequency domain based approaches to bottom-up saliency detection, which are fast and consistent with psychophysics; however, there is no biological justification and such methods are only heuristically justified through its result. The motivation for our research is to provide biological justification for the frequency domain based approach and propose a more biologically plausible frequency domain based method that is simple, fast, and outperforms current methods. The four main contributions of the paper are shown as follows:1. We derive the frequency domain equivalent for a mathematical model of lateral inhibition called divisive normalization which is believed to be responsible for the center-surround antagonism that plays a key role in visual attention. We show the connection of this derivation to energy equalization or whitening, a theory of information maximization. We name our theory spectral whitening (SW).2. Based on our SW theory, we propose a fast frequency domain based saliency detection model that is also biologically plausible, referred to as frequency domain divisive normalization (FDN). We show that the initial feature extraction stage, common to all spatial domain approaches, can be simplified to a Fourier transform with a contourlet-like grouping of coefficients, and saliency detection can be achieved using frequency domain divisive normalization. Experiments show our model is consistent with single cell recordings, psychophysics, and outperforms current state-of-the-art methods in eye fixation prediction.3. Since Fourier coefficients are global, FDN assumes a global surround, but in our visual system the surround is local. In order to solve this issue, we extend our FDN model by conducting piecewise FDN (PFDN) using overlapping local patches on a Laplacian pyramid to provide better biological plausibility. We also add a motion component using a phase correlation compensated (PCC) difference map to complete our spatiotemporal model of visual attention. In both static and dynamic eye fixation prediction experiments, PFDN shows its superiority by outperforming all other methods including FDN.4. A application for our PFDN model is proposed in the field of region of interest (ROI) image compression. A JPEG post process method is given which sets DCT coefficients lower than a threshold value equal to 0. The DCT blocks that have high salience are given a low threshold while DCT blocks with low salience are given a high threshold. In this way, we increase the sparseness of low salience DCT blocks, giving those blocks higher compression ratio while maintaining the compression ratio of high salience DCT blocks. Experiment shows that our method, sparse JPEG, performs better than the existing blurring pre-processing method.
Keywords/Search Tags:visual attention, saliency map, lateral inhibition, divisive normalization, spectral whitening (SW), frequency domain divisive normalization (FDN), piecewise frequency domain divisive normalization (PFDN), phase correlation
PDF Full Text Request
Related items