Font Size: a A A

Study On The Method Of Face Detection Based On Cascaded Convolutional Neural Network

Posted on:2017-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:G Y LongFull Text:PDF
GTID:2428330566453064Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Face detection is one of the most studied topics in computer vision literature.In recent years,Deep Learning has made great success in computer vision tasks,speech recognition,natural language processing,multi-media,etc.As one type of deep model,Convolutional Neural Networks(CNNs)have been widely used in face recognition,object detection,image classification,which are difficult problems to be solved in computer vision and pattern recognition research area.Many face detection methods based on CNNs or DCNNs(Deep CNNs)have been proposed and achieved huge success.However,CNN-based methods,especially DCNN-based methods,usually need large amounts of training data and computational power.While they perform well on expensive,GPU-based machines,they are often unsuitable for personal computers which have limited computational resources.To solve this problem,we adopt a cascade architecture built on CNNs with relatively less training data and limited computational resources while maintaining good performance,and can achieve fast face detection.Our work is mainly divided into those following parts:1)A 3-stage CNN cascade structure is designed with the simplification and optimization of CascadeCNN which is proposed in 2015 CVPR by Li et al.3 CNNs for face vs.non-face binary classification in CascadeCNN are kept and 3 CNNs for bounding box calibration,which is formulated as multi-class classification of discretized displacement pattern and needs lots of training data,are abandoned.The network structure of the third stage is adjusted according to AlexNet,and normalization region is set to 5×5 in normalization layer.A multi-resolution structure is adopted in the network of the second and third stage,more precisely,the fully-connected layer from the network of the first stage is concatenated to the fully-connected layer of the network of the second stage.So does the third stage,the fully-connected layers of the previous two networks are concatenated to the fully-connected layer of the third network.It is approved that multi-resolution structure can achieve the same recall level with less false detection windows compared to the single-resolution structure.2)Different groups of training parameters are set to optimize the network with the prepared training data and the cascade network is finally well trained.There is only about 200 K training data we prepared,which is obviously small compared to millions or ten millions of training data that recent DCNN-based methods use.In order to choose the network with best performance,different training parameters are set to train the network of every stage,such as learning rate,batch size,epoches,etc.Experiments show that the network is easily trained on a CPU core due to the relative simple structure of the network and relative small training data.3)The detection performance of the cascade network is estimated on public face testsets FDDB,AFW,and a detailed analysis was done according to the test result.The cascade network achieves 77.43% recall rate on FDDB and 76.02% average precision on AFW.This result reveals that the cascade network performs relatively well but there is still a small gap to the state-of-the-art face detection methods.Some extra experiments are conducted to analyze the above result.It is showed that the cascade network can reach an accurate detection on in-the-wild images at most of the case.However,there is a shortage of the cascade network when to detect the faces which are extremely fuzzy and occluded.Furthermore,the cascade network still runs fast without code optimization,it takes only 511 ms to fully scan a 640×480×3 image on a 2.9GHz CPU core.4)For a further acceleration of training,Local Binary Pattern(LBP)features are combined with the cascade network.LBP features describe the local texture information of the face.Experiments show that LBP features as the input of the network occupies less memory and the corresponding network contains less parameters compared to the network which takes raw images as input,so it spends less time on training.On comparison of the detection performance,it is found that LBP can easily lead to the loss of global information and contextual information,so when refer to the face detection problems under complicated conditions,the cascade network trained by raw images outperforms the LBP-trained cascade network slightly.
Keywords/Search Tags:Face Detection, Limited Computational Resources, Cascaded Convolutional Neural Network, LBP Features
PDF Full Text Request
Related items