Font Size: a A A

Research On Visual Perception-based Video Coding

Posted on:2010-10-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L XuFull Text:PDF
GTID:1118360275986667Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Conventional video coding systems derive from the rate distortion theory which focuses on eliminating the temporal, spatial and codeword redundancies of the video data. H.264/AVC is a landmark of these conventional systems because it achieves a significant coding gain. However, how to efficiently analyze the video content is not fully studied yet so H.264/AVC is not capable enough to compress the visual redundacy. If the video coder can simulate the Human Visual System and understand the video content, video compression will be established on a visual observation basis, and thus the coding framework and coding efficiency will be significantly improved. In this dissertation, we proposed our visual perception based video coder in four parts.In the first part, we intensively research the observation behavior of HVS and then propose a computational HVS model. An efficient computational HVS model can be used as the pre-processing module of the video encoder. With this module, the encoder is capable to unequally compress the video data in terms of the visual importance. In the temporal domain, the HVS is sensitive to moving objects, so the region with motion is more visually important than the still ones. In the spatial domain, contrast is an important factor impacting the visual observation. A significant contrast may efficiently grad the eyes and thus gains more attention. Based on these two points, a visual attention model which is composed by the moving objects extraction and contrast map is proposed. For moving objects extraction, conventional methods suppose the camera is still and then extract the moving objects. However, in many applications, the camera is moving with the target, so the conventional methods are not applicable. To handle this issue, a camera motion estimation algorithm is proposed at first. Then, a camera motion based background modeling is conducted. Finally, the moving objects are extracted from the detected background. For contrast map, we propose that the visual contrast is modeled based on two factors: the luminance difference of two target pixels and their spatial distance. Because the visual observation is affected by both the temporal and the spatial properties of the video data, we design a multi-model fusion scheme to combine the foreground extraction technique and contrast map together.In the second part, we propose a novel JND model. The distortion less than JND is acceptable to HVS, so JND can be used to improve the coding efficiency. JND can be modeled in several domains such as the temporal, spatial, frequency and pixel domains. To efficiently use JND to improve the coding performance, we model the JND in pixel domain, so wer choose background luminance sub-model and texture mask sub-model. In particular, we design a computationally efficient background luminance estimation template and then propose an improve background luminance model for the video. In addition, we design a cross-shaped template to detect the texture. The computational complexity of this method is low and its performance is also acceptable.JND is actually a descriptor of the visual sensitivity to the distortion. A bigger JND means that the target signal can hide more distortion. Thus, we propose an adaptive quantization strategy which is controlled by the JND. Because the JND is modeled in pixel domain, the JND based quantization is located behind the DCT module. Furthermore, we propose a JND prediction technique to avoid the side information. The experiment proves that the proposed perceptual coder can save 5%-45% bitrate while keeping the subjective quality about the same.This dissertation also proposes some other visual observation based techniques for the video coding system. They are as follows: 1) an adaptive loopfilter. It adaptively controls the filtering strength in terms of the video content. Its performance is better than its counterpart in H.264/AVC; 2) a probability updating based entropy coding. It supports real time probability learning and updating. The R-D performance and the complexity of this method are both between CAVLC and CABAC.
Keywords/Search Tags:H.264/AVC, visual saliency map, JND, quantization, loopfilter, entropy coding
PDF Full Text Request
Related items