Font Size: a A A

Image Crowd Counting Based On Convolutional Neural Network

Posted on:2021-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:L Y WangFull Text:PDF
GTID:1368330602494196Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of economy,video surveillance plays an important role in maintaining public safety and social stability.In recent years,intelligent monitoring systems which enable an automatically process have been developed rapidly,providing efficiently analysis of video or image data to assist decision-making.Crowd counting is an essential component of intelligent monitoring systems,which aims to take advan-tage of computer vision techniques to accurately estimate the number of people in a single image.Therefore,it has been widely applied to multiple fields such as security early warning,urban planning,intelligent business,and traffic dispatching.With the development of deep learning,convolutional neural networks have greatly boosted the performance of crowd counting,largely reducing counting errors.However,there are still many challenges in practical application scenarios,such as crowd scale variations,perspective,background interference,and non-uniform crowd distributions.In order to address the above problems for a further improvement of the counting accuracy,we propose specific solutions to cope with these problems from the aspects of network ar-chitecture,loss function,training method and data preprocessing.The main researches and contributions of this thesis are summarized as follows:1.For crowd scale variations,we proposed a skip-connection convolutional neural network and multi-scale training method to recognize crowd objects with different sizes.Conventional convolutional neural networks are unable to cope with object scale varia-tion due to the absence of scale invariance.Facing the multiple scale of crowd objects,we design multiple multiscale convolutional units,and those units are concatenated by skip-connections to increase the receptive fields of different sizes in the network and model crowd objects with different sizes.Furthermore,the multi-scale training method adapts the proposed architecture to multiple scale inputs of the same object,thereby effectively improving the counting accuracy.2.A de-background detail convolutional network and weighted Euclidean loss were designed to deal with the problem of complex background interference in the crowd image.Through our experiments,it is found that the low-frequency components in the base layer of the crowd image primarily contained background information,while the remaining detail layer after removing the base layer is occupied by the foreground crowd.On this basis,the proposed model takes the detail layer as the input and then extracts the crowd features to estimate the crowd density map.By this way,the nega-tive influences of the background interference were minimized.Additionally,it effec-tively compresses the mapping range and promotes the network training.Besides,the weighted Euclidean loss is proposed to calculate the Euclidean distances of the back-ground and the crowd with different weights,thereby penalizing the misidentification of the background as a crowd.Experimental results demonstrate the effectiveness of the above mechanisms to overcome the background interference and decrease counting errors.3.Aiming at dealing with non-uniform crowd distributions in images,a second-order convolutional attention network was designed to adapt to the crowd density changes in different regions of the image.Local parts of the crowd image are often crowded while other areas are sparse,which increases the difficulty of crowd density estimation.In this thesis,multiple second-order convolution modules are introduced af-ter the backbone of the network to enhance the feature extraction ability,and to model a variety of crowd density distributions.The context attention module based on dilated convolutions adaptively adjusts the output features of each second-order convolution module to make it focus on the crowd areas of different densities,thus improving the robustness of the network to the complex crowd distributions.4.In order to generate high-quality crowd density maps,we proposed an encoder-decoder architecture based on multi-level feature fusion.The crowd density map is the main regression target in most counting methods,and its quality directly influences the counting accuracy.Connections are established between the encoder and the decoder to fuse the low-level local details and high-level semantic information of the network for improving crowd features.The proposed model generates a high-resolution crowd density map,and more pixels can be used to describe the local crowd details.In addi-tion,an adaptive adjustment mechanism of fusion features is designed in the network.Multiple densely connected dilated convolution layers are used to extract multi-scale context features to guide the channel attention mechanism for improving the feature fu-sion process.The pixel-wise classification task between the background and crowd is used to assistant density map estimation.Experiments on multiple datasets show that the above modules can improve the density map quality and counting accuracy.
Keywords/Search Tags:Crowd counting, Crowd density map estimation, Deep learning, Con-volutional neural network, Computer vision, Receptive field, Detail layer, Attention mechanism, Feature fusion
PDF Full Text Request
Related items