Font Size: a A A

Research On Stuctrure Similarity Based Perceptual Surveillance Video Coding And Super-resolution

Posted on:2014-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y XiaFull Text:PDF
GTID:1318330398454864Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
As important clues and evidence for video surveillance, monitoring images have important impacts to video investigation. In practical application, the distance between camera and target is usually large, and monitoring images are often polluted by device noise and compression noise, which results in the saliency regions of the picture are at low resolution and have poor quality. The statistics shows that more than60%of daytime images are at low resolution and have poor quality, and more than95%of nighttime images have the same problem. In criminal investigation, the key targets are the objects going into and out of the scene, who are suspects or potential witnesses. And face portrait experts divided into the characteristics of the key objects into the global features and the local features, where the global features refers to the outline and structure of the object and the local features refers to the rest features. Psychology experiments show that the global features, which are defined as structural features in this paper, are more important than local features for rememberance, identification and analysis. Therefore, the structural features of the salient objects are essential for video investigation. How to maintain structural features of surveillance video is an urgent problem for surveillance video coding and processing.There are three aspects related to enhancing the targets in the monitoring system, that are foreground object detection, object based video coding and image super-resolution for object structure information reconstruction. Based on the traditional perceptual video coding and face super-resolution, it is possible to maintain and enhance the details of the interest targets when the bit rate resource is limited and the image resolution is low.In this paper, we firstly review the visual perceptual coding and image super-resolution technology, and then analysis the visual selective mechanism based perceptual video coding, the masking mechanism based video perceptual coding and the learning-based image super-resolution technology. Then we make a conclusion that is the pixel similarity based video coding and processing technology has some shortcomings in bit allocation and regularization constraints for monitoring system. On the point of view in feature coding and processing, both increasing the fidelity of the global feature in encoder and decoder can effectively enhance the recognition results of surveillance images.Therefore, based on projects supported by the National Natural Science Foundation of China (No.61070080, No.61172173, No.60772106), we research surveillance video coding and super-resolution technology based on structural similarity constraints. We propose the robust video saliency analysis algorithm, perceptual video coding algorithm and robust face super-resolution algorithms to improve the efficiency of video coding and both increase the fidelity of the global feature in encoder and decoder, which has some theoretical contribution to the traditional video coding and processing. On the other hand, in the practical application of video investigation, the technology can improve the effects of surveillance video. And the main research results are as follows:(1) Background subtraction based on the spatio-temporal saliencyIn the practical application, the traditional foreground detection methods which are influenced by the light change, environmental noise and the change of foreground object motion rate, extracted foreground objects ineffective, or even maked the foreground objects undetected. Related experiments [41] showed that the un-detection of slowly moving objects can cause detection rate lower15.8%, thereby the effect of structural features coding decreased. To solve this problem, we propose the background subtraction technique based on the spatio-temporal saliency. This algorithm is based on GMM-based background model. By analyzing the impact of the light change, environmental noise and foreground object movement rate, and the relationship between the spatial saliency and temporal saliency, we establish the spatio-temporal saliency model, and extended the background update by fixed rate to adaptive rate. The proposed method can effectively improve the performance of the interest region extraction, and laid the foundation for subsequent video coding and processing.(2)Perceptual video coding based on Foveated JND and principal structure analysisIn video investigation, the areas of the salient objects needs to have high fidelity, and the un-salient regions need to have good visual quality. The traditional JND based perceptual video coding method allotted a lot of coding resources to the un-salient characteristics, so there were a lot of visual selective redundancy and cognitive redundancy, which caused the coding resources cannot focus on the identification characteristics. Related experiments [55] showed that there were more than10%of the coding redundancy in the traditional JND based perceptual video coding. To solve the above problem, we propose the DCT Foveated JND model, and introduce it into motion compensation. This algorithm is based on the traditional DCT domain JND model. By analyzing the relationship between the visual fovea and image frequency masking threshold, we introduce the DCT Foveated JND into the residual filtering model. On the other hand, we propose the video coding method based on the structure similar constraints, and applied the structure distortion model to the rate-distortion optimization. We divide the image blocks into the main structure information and the rest structure information. By analyzing the impact of different hierarchical image structure to the image cognition, we establish the image distortion model based on principal structure analysis, which extend the traditional data similarity measure to the perceived structure similarity measure. The method can improve the fidelity of the salient regions and maintain the un-salient regions the same visual quality, which laid the foundation for the subsequent image processing.(3)Face super-resolution based on the structural similarity regularizationLearning-based super-resolution method utilizes training images to synthetize the high resolution image, and uses the coefficient sparse hypothesis to improve the synthetic effect. However, since the monitoring images often contain noise, when the input image is linearly represented by training images, the noise component is also reflected to the synthesis coefficients. So the reconstructed high-resolution image may contain some noises, which reduce the effect of the image super-resolution. Related experiments [87] showed that the effect face super-resolution method based on sparse hypothesis in noisy environments (Gaussian noise variance10) was only slightly better than the traditional interpolation algorithm (MSE difference is0.3). To solve this problem, we propose face super-resolution method based on the structural similarity regularization. This algorithm is based on the traditional learning-based image super-resolution. By analyzing the impact of structural similarity priori to noise suppression, we establish an adaptive representation model of a priori knowledge, which extends the image super-resolution based on statistics prior knowledge to adaptive prior knowledge. This method enhances the effect and practicality of image super-resolution.In summary, structural similarity constraints based surveillance video coding and super-resolution is proposed in this paper, which breaks some limitations of the traditional video coding and processing, and some enhancing tools for salient features are proposed. So this paper has important theoretical and application value. Based on this paper, we will research structural similarity constraints based multi-frame super-resolution and video quality assessment at next step.
Keywords/Search Tags:Video Coding, Saliency, Just Noticeable Distortion, Face super-resolution
PDF Full Text Request
Related items