Font Size: a A A

Research On Key Knowledge Of Video Coding Based On Facial Feature Localization And Face Modeling Theory

Posted on:2013-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J FanFull Text:PDF
GTID:1118330371494829Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As an essential feature to distinguish human from other animals, human face plays the role of main information carrier in interpersonal communication and social activities. For this reason, studies on human face are of great theoretical and practical significance. Particularly, the importance of human face research is sharply growing with the development of real-time multimedia services, such as video conferencing, picture phone, news broadcasting system, etc., which are all directly or indierectly related to human face. Normally, aforementioned applications are generalized as "conversational video sequence" in video coding and communication area. In this paper, the video compression methodology and technology of conversational video sequence will be researched integrating with face detection, facial feature extraction, face modeling and so on.In classic video coding theory, every part of the pictures is sequentially compressed with equal importance. Originally, compression ratio and the peak signal to noise ratio (PSNR) are taken as two basic evaluation indexes to measure a video coding algorithm. As research progressed, more and more people realized the special meaning of the region of interest (ROI). In fact, users always tend to assess the acceptability of a video coding output by observing the quality of ROIs subjectively.Thus, how to guarantee the quality of human face ROI is a frontier subject in conversational video coding.The resource limitations of internet bandwidth and computational power as well as the information loss in transmission are three chief factors that restrict video quality in receiving end, especially in conversational video coding with low-bandwidth and high bit-error-rates. In this thesis, two error-resilient strategies and one error concealment approach were investigated in order to achieve best coding quality in human face ROI under a bit-rate constrained channel.Firstly, the thesis proposed a bit allocation and resource optimization scheme to protect human face ROI and its features. The scheme consists of three pretreatment. To efficiently extract human face ROI, we considered a motion-based sub-image rejection for pyramid searching structure in Adaboost face detection method. To guarantee the accuracy of the extracted human face ROI, verification was made with the aid of facial color statistics. To refine the actual human face contour and other facial feature locations, we optimized the parameter selection of search range and convergence direction as well as the energe equilibrium condition in Snake algorithm and Active Shape Model (ASM). On the basis of assigning priority for each macro block (MB) after considering facial geometry, the scheme designed a relatively precise mean absolute difference (MAD) adaptive prediction model and QP updating rule to achieve the final bit allocation strategy. Besides, the scheme made the coding resource optimization better through thorough analysis of MB mode and other coding options. Simulation results demonstrated that, the scheme can give reasonable human face ROI and facial feature locations for each frame as well as optimal bit-rates and resources for each MB. Hence, the coding quality of human face ROI and its features were well kept. Comparison with the basic bit allocation algorithm in JM9.8and other bit allocation methods showed that the PSNR of human face ROI in our scheme was improved significantly. Meanwhile, the gap between target bit and actual bit of each frame as well as total coding time were reduced in view of the optimization on coding resource. In addition, the subjective assessment further confirmed that our proposed scheme can provide much better video reconstruction quality.Secondly, the thesis introduced the global rate distortion optimization (RDO) problem with its traditional solution and discussed the importance of coding dependencies in encoding process. By simply taking the temporal dependency as the only coding dependency in the conversational video coding, we proposed a novel global RDO framework, which is made up by comprehensive optimization of human face ROI and individual optimization of non-face ROI. Thisframework workswell in common one-pass structure, when the part of comprehensive optimization takes the influence of temporal error propagation of human face ROI into account while the individual optimization still follows the tradition rule of RDO but shares the conjunct Lagrange multiplier with the former. To obtain the total distortion of a certain human face ROI in comprehensive optimization, we constructed a human face ROI temporal propagation alternative chain based on forward motion search. With the ROI temporal propagation chain, a source distortion temporal propagation model for human face ROI was subsequently developed, in which the characteristic function based on the Laplace distribution of transformed residuals using motion compensation errors to estimate quantization errors efficiently eliminated the computational complexity. Simulation results demonstrate that, the constructed human face ROI temporal propagation chain is efficient and reasonable, the proposed source distortion temporal propagation model for human face ROI has a good performance in estimating the propagation of error, and the framework provides an effective way for RDO of human face ROI in the conversational video coding. Comparing with the independent RDO method in JM15.1and another dependent RDO (RDO-Q) method, the proposed framework can achieve obvious BDPSNR (Bjontegaard Delta PSNR) gain and BDBR (Bjontegaard Delta Bit Rate) saving for human face ROI and the entire sequence simutaniously.Thirdly, the thesis studied the error concealment method for conversation video coding and proposed a human face realistic model aided spatial error concealment strategy, which iscomprised of three basic parts. First, according to the fact that the efficiency of active appearance model (AAM) is closely associated with initial fitting position (fitting center, fitting orientation) and fitting instance (shape instance, appearance instance), we developed a coarse-grained face feature point localization method to calculate the plane deflection angle and profile deflection angle, then the fitting centre, fitting orientation and shape instance are determined. After that, the appearance instance was selected by ultilizing texture similarity. Based on tuning the initial fitting parameters, the AAM was improved and the final facial feature points were ensured to be more precise.Second, we designed a pose adjustment, shape match and texture mapping method for constructing realistic human face model by combining the obtained AAM facial feature points and Candide-3generic wire-frame face model. At last, the category of the damaged MB was determined in terms of pre-concealment result and available realistic face model, then kinds of spatial error concealment methods could be adaptively selected. Particularly, we provided a solution to search the optimal replacement of damaged MB from plane mapping result of face model for face ROI texture MB. Simulation results demonstrated that, the improved AAM algorithm is superior to the original AAM on facial feature point extraction and the reconstructed human face is more realistic. The human face model constructing method provides a reasonable solution for the recovery of depth information from single2D image. Comparing with two spatial error concealment methods implemented on JM17.1, i.e., bilinear interpolation method and adaptive directional interpolation method, the proposed method can provide excellent error conealment results to damaged frames especially for destroyed ROI areas, whether in interleave packing style or in dispersed packing style. To some extent, we can say that the proposed spatial error concealment method solves the face and facial feature recovery problem in conversational video coding.
Keywords/Search Tags:Conversational Video Coding, Rate Control, Bit Allocation, Rate-DistortionOptimization, Coding Dependency, Error Concealment, Region of Interest (ROI), Humanface Detection, Facial Feature Extraction, Human Face Modeling
PDF Full Text Request
Related items