Font Size: a A A

Real-time Human Detection And Recognition At Night Based On Deep Learning Approaches

Posted on:2022-12-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Samah Abdalrazig Fadlelmola MaFull Text:PDF
GTID:1488306779465044Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Object recognition is one of the most important techniques in the field of computer vision.Human recognition technology is one branch of object recognition,which can classify persons to correct categories from still images and video sequences.This technology is required in many applications,such as autonomous driving,surveillance,and search and rescue operations.The state-of-the-art human recognition systems focus to use deep convolution neural networks in designing models.However,recognizing persons from images and videos is still a challenging task because people can make large amounts of different movements and gestures.It is quite difficult to recognize persons when the persons are captured at different face expressions,different walking conditions,different viewpoints,and various activities under various lighting conditions,especially in low light conditions or at night.Human recognition technology plays the same or even more important role in the applications at night as at daytime.This research aims to deal with the problems of automatic human recognition in thermal images and video sequences at night.Different recognition technologies including face recognition,gender classification,gait recognition,human activity recognition,and human detection are studied and five new models for human recognition and detection are proposed in this thesis,which are the TIRFace Net model,TIRNGait Net model,YOLOv3-Human model,MTIR-HAR model,and TIRN-HD model.The five proposed models can implement human recognition and detection in thermal images and video sequences at night with high accuracies.The main research work and contributions of this thesis are as follows.(1)Create a custom DHU Night Dataset to solve the problem of limitation of the thermal image night datasets,which includes multimodal data of persons.The thermal images in the custom DHU Night Dataset are pre-processed,labeled,and annotated for training of the proposed five models.(2)A new TIRFace Net model is proposed to improve the accuracy of recognizing faces from thermal infrared images and videos in low illumination or at night.The TIRFace Net model is designed with a new network architecture based on CNNs with 23 deep layers to detect and recognize persons according to their faces.There are three sub-networks in the model.Net-1 can implement the alignment of the thermal and visible images.Net-2 can implement the combination of the thermal and visible images to create new sample images of faces for recognition.After creating samples,Net-3 learns the face samples to recognize the most discriminative face feature variations in order to predict the similarity given a pair of images.The main contribution of this model is that more complex features from thermal images can be obtained through the new network architecture,which has fewer depth layers with multi-tasks in three sub-networks Net-1,Net-2,and Net-3.The more complex features are essential to improve the recognition rate of the thermal infrared face recognition at night.In the TIRFace Net model,two datasets containing both thermal and visible images(DHUFO dataset and DHU Night Face Dataset)are used for validation.As a result,the TIRFace Net model can recognize the person's faces in TIR images and videos and achieve higher accuracies on the DHUFO dataset(acc=98.50%)and the DHU Night Face Dataset(acc=98.70%)than the other related methods.(3)A new TIRNGait Net model is proposed to improve the accuracy of gait recognition in thermal infrared images at night.The TIRNGait Net model is designed with a new NGait Net network architecture based on CNNs to recognize human gaits under different walking conditions at night.The network contains the input layer,four deep layers,a flattened module,a fully connected layer,and an output layer.The four hidden layers are designed by setting up suitable parameters based on the task of gait recognition,which can increase the accuracy and speed of gait recognition.The NGait Net can enhance thermal infrared images,obtain different features of gait such as the head,torso,two hands,and two legs,and create balanced samples for recognition.The main contribution of this model is that the more complex gait features from thermal images can be obtained with fewer depth layers and less time is needed.The four hidden layers Conv1,Conv2,Conv3,and Conv4 can implement multi-tasks as follows.Conv1 implements the batch normalization function with padding to make data smoother and less pixelated.Conv2 implements the Gabor filter function to detect the edges,connect lines between the edges to obtain the silhouettes,and extract gait features from a gait energy image(GEI).Conv3 implements the combined gait asymmetry metric(CGAM)method to integrate different parameters of gait patterns based on speeds to obtain balanced samples for recognition.Conv4 with FC implements the non-linear matching of the gait features of the person with the gait features of persons in the dataset to predict a specific person at different walking speeds.The model can recognize persons in GEI based on different speeds of human gaits because the GEI has a good descriptor of the silhouettes,which can improve the accuracy of gait recognition at night.The experimental results show that the TIRNGait Net model achieves accuracies of 97%,86%,and 87% on the DHU Night Dataset in normal,quick,and slow walking conditions,respectively.On the CASIA C dataset,the model has higher accuracies on the quickly walking conditions(acc=98%)and normal walking conditions with a bag(acc=86%)than the other related methods.(4)A novel fusion model is designed to integrate characteristics of the physical(face)and behavioral(gait)biometrics to recognize humans,which is called YOLOv3-Human model.The main goal of this model is to automatically recognize persons in different walking conditions by using different recognition technologies of face recognition,gender classification,and gait recognition.The new network architecture is designed by combining face classifiers and gait classifiers into one model based on improving the architecture of the YOLOv3 network which includes only one classifier.There are three sub-networks in the proposed network,which are OTI-Net,PDM-Net,and PRM-Net.The OTI-Net optimizes the thermal infrared images to provide more accurate features of the face,gait,and body segment in thermal images.The PDM-Net possesses the optimized images to detect persons with different sizes.The PRM-Net classifies the persons in the images for recognition.The main contribution of this model is to recognize persons in the thermal infrared images and videos using fusing face,gender,gait,and body shape to improve human recognition accuracy at night.Compared with the other individual face recognition models and gait recognition models on the same night datasets,the YOLOv3-Human model using fusion features performs better in recognizing persons in terms of accuracy and speed.The experimental results of the proposed YOLOv3-Human achieve a higher accuracy of 99% for face recognition with gender recognition and 90% for gait recognition on the DHU Night Dataset.On the FLIR dataset and KAIST dataset,the YOLOv3-Human model has a good TP detection for recognizing multiple persons with small sizes and it also archives higher AP scores of 67.54% and 65.01%,respectively.In addition,the model can recognize persons in real-time video sequences.(5)A new MTIR-HAR model is proposed to deal with the problem of multi-view human activity recognition at night.The MTIR-HAR model aims to improve the accuracy of recognizing the human movements and activities in raw data captured from thermal infrared images.A new network architecture based on RNNs is designed by adding six deep layers of NNs to the original RNNs to improve the accuracy of human activity recognition at night.The new network contains the input layer,six deep layers of the RNN,averaging module,Softmax activation function module,prediction module,and output layer.This network simulates the features of persons in raw data to obtain the parameters for recognition.The main contributions of the MTIR-HAR model are as follows.Firstly,the model has six deep layers in RNNs,which can obtain more information(complex features)from human physical activities in real-world night surveillance environments and can represent the movement of human body parts via variations in the thermal infrared images captured at night.Secondly,the model overcomes a lot of difficulties associated with traditional methods such as power consumption,high-speed computing requirements,and the need for additional hardware or wireless transmissions.Thirdly,the model achieves a higher recognition rate of human activity recognition in raw data on different types of datasets.The experimental results of the proposed MTIR-HAR model achieve higher accuracies of above 98% on the MHAD dataset and accuracies above 80.2% on the DHU Night Dataset compared with the results of the SVM method and LSTM model.(6)A novel TIRN-HD model is proposed to detect persons in thermal images and real-time video sequences at night.The proposed TIRN-HD model aims to improve human detection accuracies in real-world night surveillance environments.We design a new network by enhancing the architecture of the Tiny-yolov3 network,which differs from other methods in internal design,pre-processing,feature extraction,and detection algorithm.The new network architecture contains the TIE-Net and PDM-Net interconnected by the Upsampling layer.The thermal infrared images are enhanced and optimized using the TIENet because it contains the Conv and De Conv structures of GAN-Net among the initial convolution layers to reduce information loss between the convolution layers.The Up-sampling layer is added to connect the output of the TIE-Net with the input of PDM-Net and to reduce the sampling rate of data.The PDM-Net implements feature extraction(Darknet-53)to obtain more complex features of a person and person detection(PDL-Net)for recognition.The TIRN-HD model uses an RGB pre-trained model to create new weights,which are used to learn thermal infrared images.To predict the persons in thermal images and real-time videos,the parameters of the persons in the test image are matched with the parameters of persons in the images in the dataset.The main contribution of this model is to develop the RGB YOLO person detection to detect persons in thermal images by using the Enhanced Tiny-yolov3 network.The experiment results show that compared with other related methods the proposed TIRN-HD model achieves the highest AP scores and TP detection in detecting persons with less delay in real-time when the persons at different sizes walk under various lighting conditions,different weather conditions,and different distances from the camera.In this research,we create a custom DHU Night Dataset and propose five models to recognize persons in TIR images and videos at night.All experimental results of the five proposed models-TIRFace Net,TIRNGait Net,YOLOv3-Human,MTIR-HAR,and TIRN-HD outperform the results of the other related methods in terms of accuracy and speed.Furthermore,the proposed YOLOv3-Human model and TIRN-HD model based on the YOLO architectures use the fusion features of the face and gait features and fusion body transactions for human recognition,which achieve the highest recognition rate using less time compared with the models based on CNNs and RNNs such as the TIRFace Net,TIRNGait Net,MTIR-HAR,which use individual face or gait features.
Keywords/Search Tags:human recognition, human detection, face recognition, gait recognition, human activity recognition (HAR), convolution neural network (CNN), recurrent neural network(RNN), deep learning, feature extraction, thermal infrared(TIR) image, YOLOv3, Tiny-yolov3
PDF Full Text Request
Related items