Font Size: a A A

Crowd Counting Algorithm Based On Multi-Scale Convolutional Neural Network

Posted on:2020-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:X PengFull Text:PDF
GTID:2518306311482934Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of GPU,the computing power of computer has been improved.Neural network is no longer limited by computing power,scholars have increased the research on deep learning,neural network and other aspects.Crowd counting is the real-time detection of crowd density or number of people in a crowd image or video.The early detection method was done by target detection technology,but now it adopts convolutional neural network for end-to-end detection.The population density estimation based on the convolutional neural network is generally based on the density map,which can effectively avoid the problems caused by the transformation of visual scale,the occlusion of the object to the target,and the distortion of perspective.Firstly,the present situation of population density estimation at home and abroad and its research difficulties are introduced.Some related principles and some classical algorithms are introduced.Finally,a convolutional neural network framework and a self-monitoring task for auxiliary network learning are proposed.Tests were conducted on several public data sets to demonstrate the efficiency and feasibility of this model algorithm.The main work is as follows:Firstly,a single-column,multi-scale convolutional neural network is proposed,which provides a data-driven deep learning method that can understand various scenarios and perform accurate counting estimation.The network model is mainly composed of the front end and the middle end as the two-dimensional feature extraction and the back end to restore the density map.The stack pool is used instead of the maximum pool layer to increase the scale invariance of the model without introducing additional parameters.The front end of the network model adopts partial vgg-16 structure,while the middle end adopts Feature Map Encoder(FME)to break the independence between different columns and better extract multi-scale Feature information.At the back end,three columns and five layers of void convolution with different expansion rates are adopted to increase the receptive field while maintaining the same resolution,so as to generate a higher quality population density map.A relative number loss is introduced to improve the performance in the case of sparse density crowd.The neural network model has shown good results in two of the most widely used data sets.Secondly,a self-monitoring task is proposed for network training of crowd counting.Unmarked crowd images during training were used to significantly improve performance.The main idea is that even without the exact number of people in the crowd image,it is possible to know that the image block sampled from the crowd image contains the same or fewer people as the original image.This provides a technique for generating sub-image sequences that can be used to train networks to estimate whether one image contains more people than another.In this paper,rich crowd image data on the Internet are used to generate a sorted data set to assist the training of crowd counting network,which can significantly improve the performance of crowd counting.
Keywords/Search Tags:Crowd Counting, Convolutional Neural Network, Learning To Rangking, Dilated Convolution, Feature Map Encoder
PDF Full Text Request
Related items