Font Size: a A A

A Multi-scale Deep VLAD Convolutional Network For Crowd Counting

Posted on:2019-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y B SunFull Text:PDF
GTID:2428330542494414Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the development of computer technology,especially artificial intelligence in the field of computer vision,intelligent video surveillance has become an important part of constructing smart city.However,the traditional video surveillance technology cannot satisfy current demand,the intelligent video surveillance system based on artificial intelligent algorithms is now the leading method to analysis images or videos in video surveillance domain.As one of the most important branch,crowd counting has been an active research topic in computer vision.Crowd counting refers to estimating the number of pedestrians in the still images or videos from surveillance cameras.Due to the difference of surveillance cameras and the pedestrians' complex background,these problems bring great difficulties to estimate the accurate number of the crowd.Unlike the traditional way to count,convolutional neural network?CNN?try to learn and extract the features automatically from the pedestrian images,skipping the procedures of motion segmentation and using the handcrafted feature detectors.The CNN-based methods usually outperform traditional approaches and appear to be more robust with different scenarios.Although CNNbased methods have achieved higher accuracy,the existing methods fail to predict the number of the crowd meeting camera illumination changing,partial occlusions,diverse crowd distributions and perspective distortions.Because of the lack of training data,the existing approaches almost all choose relative shallow CNN model,thus fail to capture the details of the pedestrians' images and do not utilize the learned features properly.In order to solve the problems above,a Multi-scale deep NetVLAD crowd model is proposed to achieve better performance.In this paper,the proposed deep CNN crowd model has stronger ability to extract the features from the crowd image than the most relative shallow models.To deal with the multi-scale problem of the images and manipulate the learned CNN features more reasonable,VLAD coding method could help the crowd counting net with robustness.Existing CNN-based approaches usually use features from the highest layer,which is far from satisfactory in capture the easy to omit details features in the images.In this thesis,both high and low stage features are used,the whole model try to estimate the crowd though the crowd density map,the lower stage features could supply complementary information to the whole model.In addition,a novel data augmentation method to train very deep neural networks for crowd counting.These novel techniques can generate diverse training samples and prevent severe over-fitting.The proposed method is end-to-end trainable and is more effective and robust with viewpoint changes,scale variation and occlusions in extremely congested scenes.The method has been tested on three benchmark datasets including the challenging UCFCC50 dataset,Shanghaitech dataset and WorldExpo'10 dataset.Experimental results show that our proposed method achieves excellent performance compared with the existing crowd counting methods.
Keywords/Search Tags:crowd counting, convolutional neural network, VLAD, multi-scale, data augmentation
PDF Full Text Request
Related items