Font Size: a A A

Visual masking in natural scenes: Database, models, and an application to perceptual image coding

Posted on:2016-09-27Degree:Ph.DType:Dissertation
University:Oklahoma State UniversityCandidate:Alam, Md MushfiqulFull Text:PDF
GTID:1478390017978128Subject:Electrical engineering
Abstract/Summary:
Studies of visual masking have provided a wide range of important insights into the processes involved in visual coding. However, very few of these studies have employed natural scenes as masks, and little is known on how natural scenes affect visual detection thresholds. This report describes a study designed to obtain local contrast detection thresholds for a database of natural images. Via a three-alternative forced-choice experiment, thresholds were measured for detecting 3.7 cycles/degree vertically oriented log-Gabor noise targets placed within an 85x85-pixels patch (1.9 degrees patch) drawn from 30 natural images. Thus, for each image, a masking map was obtained in which each entry in the map denoted the RMS contrast threshold for detecting the log-Gabor noise target at the corresponding spatial location in the image. Qualitative observations showed detection thresholds were affected by several patch properties such as visual complexity, fineness of textures, sharpness, and overall luminance. The quantitative analysis showed that except for the sharpness measure (Pearson correlation coefficient, CC of 0.7), the other tested low-level mask features showed a weak correlation (CC less than 0.52) with the detection thresholds. Three computational models of visual masking were used to predict the thresholds. The first model was a feature-regression model, the second model was an optimized gain-control model, and the third model consisted a three-layer convolutional neural network (CNN) architecture. In terms of CC and RMSE, the gain-control model performed the best with overall CC and RMSE of 0.83 and 5.2 dB, respectively. However, in terms of execution time, the CNN model performed the best with an average execution time of 5 seconds per image, compared to 40 seconds and 66 seconds for the feature-based and gain-control model, respectively. Furthermore, a structural facilitation model is proposed to improve the prediction for patches containing recognizable structures. Prediction performance increased for images with structures: for image geckos, child_swimming, and foxy the CC became 0.77, 0.87, and 0.63 from 0.68, 0.85, and 0.58, respectively. Moreover, using a subjective local-quality-assessment experiment it was found that masking predicted the local quality scores more than 95% correctly above 15 dB threshold within 5% subject scores. Finally, a block based quantization scheme was proposed for still-image compression for high-efficiency-video-coding standard using the masking model. The compression gain was around 23%, and 30% for at threshold, and 1 dB beyond threshold, respectively.
Keywords/Search Tags:Masking, Model, Natural scenes, Image, Detection thresholds, Respectively
Related items