Font Size: a A A

Research On Modeling Of Scalable Video Coding And Perceptual Quality

Posted on:2010-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z N WangFull Text:PDF
GTID:2178360272996304Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Scalable video coding (SVC) refers coding a video into an embedded bit stream that has a high quality when completely decoded, and has a lower quality when the bit stream is truncated. When a video is coded into a scalable stream with spatial, temporal, and amplitude1 scalability, the same video content may be delivered with varying frame rate or frame size or quantization stepsizes, depending on the substainable transmission rate, display resolution, and battery status (for battery-powered devices) at the receiver. Scalable video is particularly attractive for video multicast, where receivers of the same video often have different sustainable transmission rates with the server and varying decoding and display capabilities.Even for unicast, SVC allows the server to store just one bitstream, but send different portions of the stream to receivers with different bandwidth and energy resources.In this paper, we focus on modeling the impact of temporal and amplitude resolutions (in terms of frame rate and quantization stepsize, respectively) on both rate and quality. We further apply these models for solving the rate-constrained SVC adaptation problem assuming the spatial resolution is determined based on other considerations (e.g. display size of the receiver).Our quality model relates the perceptual quality with the quantization stepsize and frame rate. It is derived based on our prior work, which uses the product of a metric that assesses the quality of a quantized video at the highest frame rate, based on the PSNR of decoded frames, and a temporal correction factor for quality (TCFQ), which reduces the quality assigned by the first metric according to the actual frame rate. In the quality model proposed here, we replace the first term by a metric that relates the quality of the highest frame rate video with the quantization stepsize. Each term has a single parameter, and the overall model is shown to fit very well with the subjective ratings, with an average Pearson correlation of 0.984 over four test sequences.Our rate model predicts the rate from quantization stepsize and frame rate. It also uses the product of a metric that describes how the rate changes with the quantization stepsize when the video is coded at the highest frame rate, and a temporal correction factor for rate (TCFR), which corrects the predicted rate by the first metric based on the actual frame rate. As with the quality model, it has two parameters only and fits the measured rates of decoded SVC video from different temporal and amplitude layers very accurately (with an average Pearson correlation of 0.998 over four sequences).We further apply these models for rate-constrained SVC bitstream adaptation, where the problem is to determine the frame rate and quantization stepsize that can lead to the highest perceptual quality for a given target rate. We derive the optimal frame rate topt and quantization stepsize qopt,both as a function of the rate R, first by assuming t can vary continuously to provide theoretical insights, and then by considering the feasible set of discrete frame rates afforded by the hierarchical temporal prediction structure.In this paper we examine the impact of frame rate t and quantization stepsize q on the rate and perceptual quality of scalable video. Both models are developed based on the key observation from experimental data that the relative reduction of either rate and quality when the frame rate decreases is quite independent of the quantization stepsize. This observation enables us to express both rate and quality as the product of a function of q and a function of t. The proposed rate and quality models are analytically tractable, each requiring only two content-dependent parameters. The rate model fits the measured rates very accurately, with an average Pearson correlation of 0.998, over four video sequences. The quality model also match the MOS from subjective tests very well,with an average Pearson correlation of 0.984.The proposed rate and quality models have other applications beyond SVC bit stream adaptation. One important application is in non-scalable encoder optimization, e.g., determining the optimal encoding frame rate for a target bit rate.It can also be used for scalable encoder optimization, e.g.,determining the appropriate temporal and amplitude layers to generate and include at different rate ranges.
Keywords/Search Tags:Perceptual
PDF Full Text Request
Related items