Font Size: a A A

Deep Learning Based Key-point Localization Algorithms

Posted on:2020-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:1488306512982469Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Key-point localization(i.e.,key-point detection)is a fundamental problem in various computer vision tasks.Some common topics in key-point localization include facial landmark localization,human pose estimation(i.e.human body joint localization).In recent years,with the growing demand of human-computer interactive devices,it is becoming a popular research topic and also an indispensable part in relevant products.This thesis addresses four key problems in key-point localization: robustness to poses and occlusions,design of the elegant end-to-end framework,unified framework for different tasks and robustness to low-resolution image input.From the aspect of methods,we start from learning a better mapping by deep neural networks(DNN)based on traditional hand-crafted features.Then by designing the deep convolutional neural networks,we accomplish the goal of learning feature representations and mappings simultaneously in an effective and elegant framework.From the aspect of applications,we first focus on facial landmark localization.Then by proposing a more powerful model,we conduct facial landmark localization,human pose estimation in the same framework.Finally,we even deal with low-resolution and blurred input images.The detailed procedure is illustrated as follows:1.An Auto-endocer Network is developed to learn the mapping from features to coordinates to handle the problem of facial landmark localization.By using a DNN for initial landmarks and pose estimation,we induce samples into different pose groups to reduce the non-linearity between large pose variations.Then cascaded DNNs are used to refine the landmarks in a coarse-to-fine manner.In this way,the complexity of the feature mapping is reduced and DNN based mapping outperforms traditional methods in face alignment.2.A recurrent neural network is developed to deal with facial landmark localizaiton in videos and single images.A recurrent neural network is developed to handle temporal dependency of faces in a video sequence.Landmarks with large poses and expressions are well predicted by the connections to neutral faces in the sequence.To find a better feature representation for single-image landmark prediction,we investigate the spatial relationships between features of different landmarks by a spatial Recurrent Neural Network(RNN).Facial structural information is developed based on local features.Long-short-term-memory(LSTM)module is used to capture the spatial relationship between non-adjacent facial landmarks.The simultaneous learning of joint feature representation and feature-coordinate mapping helps to estimate more accurate landmarks.3.A unified framework based on adversarial learning and Convolutional Neural Network(CNN)for facial landmark detection and human pose estimation is proposed.As the problem of landmark localization is extremely complex,it was previously considered difficult to predict landmarks from images directly.To handle this problem,we proposed the following strategies.First,to overcome the limitation of hand-crafted features in feature representations,we develop a deep CNN to predict the heatmap based landmark results.Also,to overcome the disadvantage that CNN sometimes lead to some implausible results,an adversarial learning strategy is developed.The framework is done in an end-to-end and elegant way,which avoids the previous complex cascaded strategy.The method is applied to facial landmark estimation and human pose estimation in both 2D and 3D,which shows excellent performances in both tasks.4.A joint face super-resolution(SR)and prior estimation network is proposed.Previous landmark localization methods mostly can't handle with low-resolution or blurred images.Previous face super-resolution methods also lack in using abundant faca priors.To deal with the two problems,we proposed the following method.First,a coarse SR network is used to get an initial SR result.Then a prior estimation network is proposed to get the priors based the the coarse SR result.The prior is considered as the prior feature jointly used with deep features from a deep CNN for extracting deep SR features.Finally,a decoder is used to get the SR output from the prior and image features.This method realizes face SR and facial prior estimation in an elegant and end-to-end network.The method achieves state-of-the-art prior estimation performance by predicting on low-resolution inputs,even when compared to other methods performed on the ground-truth high-resolution images.
Keywords/Search Tags:landmark localization, pose estimation, auto-encoder network, recurrent neural network, end-to-end network, convolutional network, face super-resolution
PDF Full Text Request
Related items