Font Size: a A A

Hand Segmentation In Image And Video

Posted on:2020-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:M L LiFull Text:PDF
GTID:1368330578481670Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of intelligent terminals and visual computing tech-nologies,vision-based human computer interactions(HCI),especially interactions via hands,are used increasingly in our daily life.For various hand-based interaction ap-plications such as gesture recognition,hand pose estimation,hand-object interaction analysis and so on,hand segmentation is a key component.Although hand segmenta-tion has been studied for decades,it is still immature in either accuracy or efficiency for practical usage.This thesis focuses on exploring efficient and accurate hand segmenta-tion in images and videos,with three approaches proposed.Firstly,this thesis proposes an approach to precise hand segmenting from a single depth image,which mainly consists of a segmentation proposal generation module and a proposal evaluation module.Given a depth image,it is easy to extract a rough hand region of interests(ROI)from depth maps according to depth information.Yet,it is non-trivial to precisely separate hands from the rough hand ROIs due to various chal-lenges.Observing that hand ROIs are often ribbon-like shapes,we propose a method to generate redundant hand segmentation proposals alongside the arm orientation of a hand ROI via constrained Delaunay triangulation(CDT).Then,we explore both R-CNN like framework and Fast R-CNN like framework to predict a confidence score for each proposal to quantify its matching degree with the ground-truth segmentation.Finally,the proposal with the highest confidence score is selected as the hand segmentation re-sult.Experiments on two large depth datasets demonstrate that this approach achieves superior segmentation accuracy than previous approaches.Secondly,this thesis proposes an approach to accurate and efficient hand segment-ing from depth videos,which extends CNN-based hand segmentation methods from depth image to video by leveraging the continuities among adjacent frames.It consists of two main branches:flow-guided feature propagation and light-weight detail enhance-ment.The flow-guided feature propagation branch runs an image hand segmentation network only on sparse frames and warps intermediate features to other frames accord-ing to the cross-frame flow field.Compared with the per-frame inference,it achieves significant speedup,while causing large accuracy degradation due to distortion issues in the propagation.To relieve such problem while remain high efficiency,we propose to introduce a light-weight detail enhancement branch,which extracts low-level detail enhancement features from current frame to enhance the propagated features.Experi-ments on a large public depth video dataset demonstrate the effectiveness and efficiency of this approach.Lastly,this thesis proposes an approach to accurate and efficient hand segment-ing from egocentric color videos.Based on flow-guided feature propagation and light-weight detail enhancement,this approach introduces an additional occlusion attention module to better handle the occlusion issues of feature propagation in egocentric videos.Concretely,it empirically predicts a soft occlusion map to estimate the occlusion degree of each point in the propagation.Based on the soft occlusion map,spatial attention is used on the detail enhancement features extracted on the current frame by the detail en-hancement branch.Then,the occlusion attended enhancement features are fused with the propagated features and fed to a decoder to predict segmentation results.Exper-iments on three large public egocentric video datasets demonstrate that this approach achieves better accuracy-latency tradeoff than previous approaches.
Keywords/Search Tags:Hand Segmentation, Constrained Delaunay Triangulation, Convolutional Neural Network, Flow-guided Propagation, Detail Enhancement, Occlusion Attention
PDF Full Text Request
Related items