Font Size: a A A

Monocular Vision Based 3D Reconstruction And Human Action Recognition From Skeleton Data

Posted on:2019-11-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiFull Text:PDF
GTID:1368330623453315Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
3D reconstruction and human action recognition are the fundamental issues in machine vision and image processing fields.Compared with the multi-view system,The monocular system widely exists in real life.Thus,It is of great theoretical and application value to obtain the 3D information from the monocular system,and to recognize the action based on these 3D information.Related research could be widely applied in 3D scene modeling,automatic driving,augmented reality,human-machine interaction and other fields.In recent years,although the research of visual based 3D reconstruction and action recognition have made great progress,it is still a long way to deal with many challenges,such as the processing of complex time-varying scenes,the uncertainty analysis of the reconstruction ability,and the fusion of the data driven and geometry constraint framework for the reconstruction and semantic perception problem.Based on the above background,this paper focuses on a series of key technologies of monocular vision based 3D reconstruction and 3D-data based action recognition problems.More specific,we will emphasis on three sub problems:monocular image depth estimation,articulated traj ectory 3D reconstruction and human 3D skeleton based action recognition.Our main contributions could be summarized as follows:·Aiming at solving the distinguished feature learning and depth map post refining,a novel monocular image depth estimation method based on super pixel depth regression and hier-archical conditional random field is proposed.Given an image,firstly,we obtain its super-pixels.For each super-pixel,we extract multi-scale image patches around the super-pixel center A deep CNN is then learned to encode the relationship between input patches and the corresponding depth.However,the depth image obtained by the super pixel regression alone lacks the overall structure information,which is not smooth and has obvious mosaic effect.To deal with the problem,we refine the depth estimate from the super-pixel level to pixel level by inference on a hierarchical conditional random filed(CRF).Importantly,our MAP inference problem has a closed form solution and hence has a high computational ef-ficiency.Experimets on the NYU v2 and KITTI benchmarks show that our method could produce high quality depth map.·A novel monocular image depth estimation method based on fully convolution neural net-work and soft weighted sum inference is proposed.First,we represent monocular depth estimation as a multi-category dense labeling task by contrast to the regression based for-mulation.In this way,we could train this model with effective cross-entropy loss.Second,we fuse different side-outputs from our front-end dilated convolutional neural network in a hierarchical way to exploit the multi-scale depth cues for monocular depth estima-tion,which is critical in achieving scale-aware depth estimation.Third,we propose to utilize soft-weighted-sum inference instead of the hard-max inference,transforming the discretized depth scores to continuous depth values.Thus,we reduce the influence of quantization error and improve the robustness of our method.Extensive experiments on the NYU Depth V2 and KITTI datasets show the superiority of our method compared with current state-of-the-art methods·We present a novel robust relaxation-based method for articulated trajectory reconstruc-tion from a monocular image sequence.We propose a relaxation-based objective function,which utilizes both smoothness and geometric constraints,posing articulated trajectory re-construction as a non-linear optimization problem.The main advantage of this approach is that it remains the re-constructive power of the original algorithm,while improving its ro-bustness to the inevitable noise in the data.Furthermore,we present an effective approach to estimating the hyper-parameters of our objective function,which greatly improves the practicability and reliability of the algorithm.Experimental results on the CMU motion capture dataset show that our proposed algorithm is effective·As for the skeleton based human action classification problem,we present a novel image classification based approach.First,we propose a video domain translation-scale invariant image mapping,which transforms the 3D skeleton videos to color images in a temporal preserving way,namely skeleton images.Second,a multi-scale dilated convolutional neu-ral network(CNN)is designed for the classification of the skeleton images.Furthermore,we propose different kinds of data augmentation strategies to improve the generalization and robustness of our method.Experimental results on popular benchmark datasets such as NTU RGB+D,UTD-MHAD,MSRC-12 and G3D demonstrate the superior performance of our approach by outperforms state-of-the-art methods by a large margin·As for the skeleton based human action detection problem,a novel CNN based method is proposed,Our method consists of two parts:skeleton-based video image mapping,and an end-to-end trainable fast skeleton action detector(Skeleton Boxes)based on general image detection framework.Our action detector are based on the SSD and Faster-RCNN which are the popular image detection framework.While,the asymmetric convolution kernel,space dimeision pooling and one dimensional region generation network are proposed to reduce the unnecessary computation of the network and improve the performance of orig-inal framework.It worth noting that,we propose to use one dimensional sliding window instead of directly scaling the skeleton image in order to effectively preserve the frequency domain characteristics of the image itself.Exp'erimental results on the latest and largest PKU-MMD benchmark dataset demonstrate that our method outperforms the state-of-the-art methods with a large margin.More importantly,Extensive experiments analysis are conducted to reveal the effectiveness of our method.
Keywords/Search Tags:monocular image, depth estimation, deep learning, CRF, articulated trajectary reconstruction, skeleton video, action classificaion, action detection
PDF Full Text Request
Related items