Font Size: a A A

People Detection In Indoor Scene Based On Deep Learning Method

Posted on:2018-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LiFull Text:PDF
GTID:2428330569998755Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,a large number of pictures and video data are generated every day.Computer vision,as a computer technology for processing image,is increasingly applied to daily life,such as object recognition,object detection,semantic segmentation and target tracking.With the development of computer technology,especially the parallel computing,high-performance computing and gpu technology,computer vision in recent years has achieved rapid development.As the most basic problem in computer vision,target detection has made a breakthrough.Due to its importance and performance,it has increased more and more demand for computer vision,such as activity recognition,autopilot,intelligent monitoring system,military target detection,and this growth in demand at the same time gave birth to the target detection technology development needs.For the field of target detection,this paper focuses on indoor scenes people detection,mainly based on the deep learning methods.We have explored human detection based on deep learning in the presence of supervised data,including in-scene people detection based on CNN and RNN and human detection based on region features and local feature fusion.At the same time,it also explores how to detect people with weak supervision,and puts forward a weakly-supervised detection method based on video.Therefore,the work of this paper is mainly summarized as the following three aspects:1)A scene detection method based on CNN and RNN is used.We use a deep convolution network to extract the target feature,and then encode it into deep features.Finally,RNN is used to decode the depth feature into the target bounding boxes.Specifically,we first turn the picture into a grid,each grid is a 1024-dimensional vector,each vector corresponds to an region of the input picture.This 1024-dimensional vector encodes the characteristics of the corresponding input region,carrying a rich of information such as the location of the target information.The LSTM unit obtains information from this representation vector and then decodes the region representation.For each step of the recursive network decoding,the LSTM unit outputs a new bounding box and the corresponding score,the target that hasn't been predicted by the previous one will be output in the current step.While bounding box scores are encouraged to occur in a decreasing sequence.When a LSTM output scores below a threshold,then the stop signal is generated.Finally,the output results are collected and used as final predictions for multiple instances within the region.The method can obtain high detection precision and good detection speed.2)A method of personnel detection based on region feature and local feature fusion is proposed.This section focuses on human detection in indoor scenes,especially for the head detection in crowded scenes.Detecting human heads in crowded scenes is a difficult task due to the dramatic changes in the appearance of the clothes,the small size of the person,and some strong occlusion.The traditional bottom-up box method and the region regression network suffer from low recall rate or low accuracy.In this paper,we integrate the local head information into the region regression model to imporve the recall rate and accuracy.We first use a region regression network to predict the bounding box and corresponding box scores of multiple head instances in a region.We then use these bounding boxes to train a local head-classifier.Finally,we propose an adaptive fusion method to combine the score of region score and local score for each bounding box,resulting in a more accurate bounding box score.Our fusion method automatically learns the optimal parameters from the data,and our algorithm performs well on a crowded dataset,with detection accuracy significantly superior to current best-of-breed methods.3)An weakly-supervised indoor scene detection method based on video is proposed.Although the performance of human detection can reach a high level in the case of label,the label data is often difficult to obtain and the data is updated continuely.Therefore,it is very important to use unsupervised or weakly supervised people detection methods.But there are not a lot of related work.Therefore,in this paper,for indoor scene data,I introduced a video-based weakle-supervised detection method.Because we are concerned with the detection of people in the indoor scene,so in fact we only need to indicate the human video,our method can automatically learn a scene detection model for personnel detection.Our approach consists of two processes,a training process and a testing process.The training process consists of several phases,which include: 1)foreground extraction based on Gaussian mixture model;2)foreground instance partitioning based on clustering algorithm;3)model training based on pseudo-label.Once the target detection model is learned in the training phase,the test results can be detected for the test picture,then generate the end-to-end output,so the test is very convenient.
Keywords/Search Tags:human detection, in-scenes, convolutional neural network, recurrent neural network, feature fusion, cluster
PDF Full Text Request
Related items