Font Size: a A A

The Research On H.264 Video Transcoding And Scalable Video Coding

Posted on:2009-12-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z G LiuFull Text:PDF
GTID:1118360245996097Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Video sequences are often used in different application environments,ranging from transmitting channels,storage media and display terminals.Video adaptation provides different technical schemes including video transcoding and scalable video coding,which all provide the responding resolutions.For instances,video transcoding module can be adopted in network access point,and the required video format can be transcoded directly.On the other hand,in scalable video coding,source video is encoded once,and decoder can receive partial bitstream according to its special application.Video transcoding can be classified as homogeneous transcoding and heterogeneous transcoding according to incoming bitstream standard and outgoing bitstream standard.In homogeneous transcoding,there are main three aspects:spatial resolution transcoding,temporal resolution transcoding,and bit rate transcoding.The easiest implement of video transcoding is cascaded pixel domain transcoding and it is also the most computational complexity scheme.To speed up the re-encoder process, the decoded information from incoming bitstream should be utilized in video transcoding.In scalable video coding,source video is encoded at the highest resolution,and the decoder can receive partial bitstream depending on specific rate resolution required by a certain application which can release the burden of encoder.The popular hybrid motion compensated prediction and block transform scheme will cause the "drift" effect when the decoder receives the in-complete bitstream because of its recursive structure.The wavelet encoding scheme based on motion compensated temporal filtering,which entirely abandons recursive structure,can provide high flexibility in bitstream scalability for different spatial,temporal and quality resolutions.However,in conventional motion compensated temporal filtering encoding scheme,the group of picture structure is fixed which don't consider the variation of motion activities in real video sequences.In video transcoding and scalable video coding,the main contributions in this thesis including:1.In intra prediction mode selection of spatial resolution transcoding,the percentage of non-zero coefficients(nz_per)in pre-coded flame is utilized as criterion to select macroblock mode in downsized frame.A Tk_I_Q_r model describing the relationship between re-quantization parameter and threshold of nz_per which implemented by an exponent curve is proposed in this part.This model is converted into a linear regression model,and least square method is adopted to estimate parameters in the model.To meet up with the requirement of specific video sequence,an update process of parameters in the model is proposed in this thesis,which utilizing selected macroblock modes in re-encoder process.After the selection of intra macroblock mode,a fast intra prediction mode selection is proposed in the thesis,which utilizing incoming maeroblock modes and prediction modes in pre-coded frame,and computational complexity can be reduced greatly by the proposal.According to the experimental results,on the pre-condition that the maximum PSNR loss is about 0.6dB,the computational complexity can be saved is about 20%~25% by the proposal comparing to full search algorithm.2.In the inter mode selection part of spatial resolution transcoding,nz_per is utilized to classify the motion activity of current maeroblock,and some candidate macroblock modes are skipped according to the classified result.A Th_P_Q_r model is proposed to deseript the relationship between nz_per threshold and re-quantization parameter in re-encoder process.As similar to Th_I_Q_r model,an exponent curve is adopted to descript the relationship between nz_per threshold and re-quantization parameter,and an update process of parameters in model is also proposed in the thesis.The initial motion vectors of macroblock are calculated according to pre-coded frame, and they are not very precision,especially in the situation of re-quantization parameter is large.A new motion vector refinement method is proposed which adopts nz_per as criterion to calculate the refinement steps.In the proposal, with the increase ofre-quantization parameter,the refinement steps increase as well.In the area with high nz_per value responding to high motion activity,a longer refinement steps is used.The Th_P_Q_r model is also extended into temporal resolution transcoding in this thesis.According the experimental results,the proposed method achieves about 15-20 times improvement in the re-encode computational complexity comparing to full search algorithm, while the maximum PSNR is degraded by 1.1 dB,and about 35 times can be improved by the proposal in macroblock mode selection part.3.The classification method is introduced firstly into macroblock mode selection in this thesis.A fast mode decision scheme is proposed based on support vector machine.The features vectors used in training and classification stage of support vectors machine are distilled from incoming bitstream,including motion vectors,residual data,pre-coded macroblock modes,and quantization parameters etc.A H.264 video transcoding including spatial resolution transcoding,temporal resolution transcodmg,and bit rate transcoding simultaneously is implemented based on classification method for the first time.The extensive experiments are performed including intra mode selection and inter mode selection.In intra mode selection part,the proposed method achieves about 15 times improvement in the computational complexity comparing to full search algorithm,while the maximum PSNR is degraded by 0.3dB;On the other hand,about 25-30 times can be speed up in inter mode selection,while the PSNR is degraded by 0.2-1.2dB depending on different sequences and bit rate.4.In MCTF encoding scheme,we propose an adaptive group of picture structure selection scheme,in which the group of picture size and low-pass frame position are selected based on mutual information.Furthermore,the temporal decomposition process is determined adaptively according to the selected group of picture structure.A large amount of experimental work is carried out to compare the compression performance of proposed method with the conventional motion compensated temporal filtering encoding scheme and adaptive group of picture structure in standard scalable video coding model. The proposed low-pass frame selection can improve the compression quality by about 0.3-0.5db comparing to the conventional scheme in video sequences with high motion activities.In the scenes with un-even variation of motion activities,e.g.frequent shot cuts,the proposed adaptive group of picture size can achieve a better compression capability than conventional scheme.When comparing to adaptive group of picture in standard scalable video coding model,the proposed group of picture structure scheme can lead to about 0.2~0.8 dB improvements in sequences with high motion activities or shot cut, especially abrupt shot cut.From the above mentioned,in the research of video transcoding,a H.264 video transcoding including spatial resolution transcoding,temporal resolution transcoding, and bit rate transcoding simultaneously is implemented in the thesis for the first time.In the proposal,the incoming bitstream with H.264 format should be full decoded.After the change of image format(spatial resolution,temporal resolution,image quality),the outgoing bitstream is re-encoded with H.264 also.In scalable video coding part,an adaptive group of picture structure selection is proposed based in mutual information. In the final part,the opening issues are discussed,and the future directions are analyzed as well.
Keywords/Search Tags:H.264, Video Transcoding, Macroblock Mode Selection, Linear Regression, Mutual Information, Motion Compensated Temporal Filtering, Group of Picture
PDF Full Text Request
Related items