Font Size: a A A

Research On Crowd Counting Method Based On Residual 3D Convolutional Neural Network

Posted on:2021-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X X XieFull Text:PDF
GTID:2428330611967612Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of the global economy,the implementation of China's urbanization policy projects has continuously promoted urban economic development,resulting in the migration of people to large cities,and the number of people has increased significantly.Therefore,overcrowding can occur in public places in some big cities.Overcrowding can easily cause security incidents.Video surveillance can be used to monitor and analyze the population density status as an important indicator of this type of security event risk warning,but the video data of traditional monitoring equipment relies on manual processing,processing procedures are cumbersome,and information lags.Therefore,the use of computer vision target detection technology to achieve intelligent monitoring equipment,so that video monitoring equipment can efficiently and accurately automatically monitor the number of people in the video,which has great application value in public security area risk warning,transportation hub scheduling,etc.Crowd counting is a research hotspot in the field of computer vision target detection.Its research purpose is to build a model architecture,input image data,and then directly obtain the number of people through model training or generate a prediction density map through pixel accumulation by model training.Out of the crowd.The main problems in the crowd counting task are: the mutual occlusion caused by the overcrowding of the input image data,the feature recognition is low;the shooting angle of the camera device is easy to cause the difference in head scale in the same frame of image.In view of the above problems,the current solutions based on the convolutional neural network model generally use multiple parallel network models to extract the spatial features of a single frame image.However,this kind of network ignores the time information of the context of the video frame sequence,and in order to save training time,it is necessary to train the single-column network in advance,but this also increases the complexity of model training;and there is no Sharing feature information increases the amount of network parameters and increases the overall computational burden.Therefore,this paper proposes a Residual 3D Convolutional Neural Network Crowd Counting(Res-3D-CNN)model to complete the counting task by generating a prediction density map and accumulating pixel by pixel.On the one hand,the3 D convolution kernel in the Res-3D-CNN model can extract the spatial feature information of the video image frame and the temporal feature information of the continuous sequence ofvideo frames.The features extracted by the model merge the motion information of the adjacent frames of the image and can solve Noise caused by crowd occlusion in a certain image.On the other hand,the Res-3D-CNN model uses a cross-connection combination method,and its effect is equivalent to a multi-column convolution network architecture.The equivalent multi-column network can combine and stack convolution kernels of different sizes to improve the model's feature expression.Ability,with the ability to extract the feature information of people in different scales of the image,to solve the problem of differences in head scales in the image.The Res-3D-CNN model proposed in this paper has been tested in crowd data sets of commonly used continuous video frames,including the Mall data set and the Wordl Expo'10data set,and made against the same data set and the current crowd counting algorithm experimental results.The comparative analysis verifies the feasibility of the algorithm proposed in this paper.At the same time,the Res-3D-CNN model,compared with the multi-column network architecture,reduces redundant parameters and computing resources of the device.
Keywords/Search Tags:Computer vision, CNN, 3D Convolution, Crowd counting, Spatiotemporal features
PDF Full Text Request
Related items