Font Size: a A A

Research On Key Techniques Of Vision-based Large Head Pose Tracking

Posted on:2010-04-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Q ZhaoFull Text:PDF
GTID:1118360302958561Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
3D head pose tracking is an important research problem in the field of computer vision and human computer interaction. Recently, it becomes to be a more attractive research direction. The principal objective of head pose tracking is estimating the 3D pose parameters by analyzing the input image sequence. The head pose information can be widely employed in human computer interaction, intelligent surveillance, video compressed-coding, face recognition, expression recognition, fatigue detection, body-controlled games, entertainments and etc.Most existing head pose tracking methods can be classified into two categories: statistics learning-based methods and registration-based methods. Statistics learning -based methods assume there exists a relationship between some facial features and 3D head poses, and it employs a large number of training images to determine this relationship. These methods are easily affected by the facial features selecting approaches, and they usually need to interpolate the recovered pose parameters, so their results are not very accurate. Registration-based methods commonly assume the head is a rigid object and estimate the pose parameters by employing the feature correspondences between two frames. The selected features might vary from one implementation to another. One approach is selecting distinct features such as mouth corners, nose tip, eye corners and etc. The tracking results of this approach will be less precise when the selected features are occluded. The other approach is selecting facial features dynamically. This approach can automatically select new features In the tracking process when some features are lost, and has more robust results. Generally speaking, registration-based methods are easily to be implemented and have more precise result. However, when the head moves in a large range, it is difficult to register two frames if there has large pose change between them. Besides that, there has a drift accumulation after a long time frame-by-frame tracking. In order to estimate 3D head pose parameters, the registration method also requires the corresponding 3D information of facial features. These existing head pose tracking methods always assume the subject has no body movement or only small body movement, e.g. the subject sitting in a chair. However, when the human beings express interest, attitude and feeling by using head pose in their daily life, they either sits in a place, or moves in a large range. In this thesis, large head pose tracking is defined as the head pose tracking while there has body movement. As compared with the common head movement tracking, the large head movement tracking technique can be more widely applied in human computer interaction, intelignt surviliance, action recognition and etc.This thesis deals with the problem of large head movement tracking by using the local descriptor to detect and match facial features. The whole process includes three steps. First, get the image information and corresponding depth map. The depth map is obtained either from stereo vision camera or by 3D reconstructing techniques. Second, register two frames and estimate the head pose change. Third, reduce the drift accumulation in the frame-by-frame tracking procedure by employing appearance model, which is also helpful for recovering the pose tracking automatically. Compared with existed work, our main contributions can be stated as follows:1. We propose a novel Scale Invariant Feature Transform (SIFT) based registration algorithm. Salient SIFT features are first detected and tracked between two images, and then the 3D points corresponding to these features are obtained from a stereo camera or 3D reconstructed information. With these 3D points, a registration algorithm in a RANSAC framework is employed to detect the outliers and estimate the head pose. By using SIFT-based algorithm, two frames can be accurately registered even when their scale are also changed. Thus, the proposed SIFT-based registration algorithm is appropriate for large head movement tracking. The proposed SIFT-based registration algorithm is the first registration algorithm designed for large head movement tracking, and the related paper has been referenced by other researchers.2. A new compact feature descriptor, called Kernel Projection Based SIFT (KPB-SIFT), was proposed. It detects the interest feature points using the SIFT feature detector firstly. And then apply kernel projection techniques to orientation gradient information in the feature point's neighborhood. KPB-SIFT is significantly faster in descriptor's matching stage, and shows superior advantages in terms of distinctiveness, invariance to scale, and tolerance of geometric distortions.3. In order to reduce the drift accumulation during tracking in large range, we propose a view-based appearance model, which can select key frames online when the head undergoes different motions. These key frames are annotated with their poses and head regions both, and collectively represent the appearances of the subject viewed from these estimated poses. To bound drift, our tracker registers current frame against its previous frame using the SIFT-based registration algorithm firstly, and then select one key frame as the base frame if its view of head is similar enough to that of current frame, and then registers current frame against the base frame using the SIFT-based registration algorithm again, finally the pose of current frame is obtained by merging results of two registrations using Kalman filter.4. We present a novel local image descriptor for dense wide-baseline matching purposes, coined SULD (Speeded-Up Local Descriptor). The building process of SULD is divided into four stages. First, convolve input image using Haar wavelet filter. Second, smooth response maps with Gaussian kernels. Third, calculate sample locations and obtain the corresponding sample vectors from smoothed response maps. Finally, normalize sample vectors and concatenate the SULD descriptors. SULD can be computed and matched much faster by employing the efficient Haar wavelet filters and integral image techniques. SULD can be used to densely matching of texture-less face image pairs and the produced depth information will be provided for monocular camera based head pose tracking. The face depth information is also widely employed in human computer interaction, expression recognition, body-controlled games and entertainments.During the research period, we have designed and implemented a head pose tracking demo system named as HPObserver. HPObserver supports video collection, depth production, pose estimation and performance evaluation. HPObserver is helpful for both ongoing research and future works.In order to evaluate the performance of the proposed approach, we do experiments on dozen of image sequences. The extensive experiments shows that, the proposed approach can obtain a robust result even in the case of the large body movement, the subject returns to the visual field of camera after abrupt leaving, the subject's facial expressions varies and an occlusion happens. We analyze the existing problems, and discuss the future directions in the end.
Keywords/Search Tags:Human computer interaction, head pose tracking, registration algorithm, local descriptors, dense stereo matching, head motion
PDF Full Text Request
Related items