Font Size: a A A

Crowd Counting Based On Convolutional Neural Network

Posted on:2023-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Q HeFull Text:PDF
GTID:1528306902453734Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the steady progress of urbanization and the rapid growth of population,crowd gathering activities have been increasing gradually,which could bring huge hidden dangers to public safety.In order to prevent emergencies and guarantee the safety of human lives and social properties,the intelligent surveillance system has attracted significant attention.As an important component of this system,crowd counting can infer the number of individuals in a specific scene to analyze efficiently and assist decision-making.It plays a pivotal role in multiple fields such as urban planning,traffic control,etc.In recent years,the COVID-19 has swept the world.Monitoring and analyzing high-density crowd can provide effective evacuation plans,which is necessary to control the spread of the epidemic.With the gradual maturity of deep learning technology in computer vision field,Convolutional Neural Networks have greatly improved the counting accuracy and generalization performance.However,the existing counting algorithms still suffer from several severe challenges in real-world application scenarios,such as huge scale variations,crowd feature alignment and semantic imbalance,redundant parameters and architectures,etc.To cope with the above challenges,we conduct researches on the crowd counting algorithms and explore innovative solutions to further improve the performance and robustness of the counting models.The main contributions and innovations are summarized as follows:1.In order to handle the dramatic scale variations in surveillance scenes,we propose a novel approach named Jointly Attention Network for crowd counting to recognize pedestrians with different sizes.Firstly,the Multi-order Scale Attention module explores meaningful high-order interactions and helps the backbone network obtain more discriminative features with scale-aware information in an explicit manner.Secondly,the Multi-pooling Relational Channel Attention module compactly represents pairwise relations from global structure patterns and mines the interdependence among all channel-wise nodes,which would assist the model to understand the distribution of pedestrians and background.Finally,the Distributed Combinatorial Loss is designed to achieve the distributed supervision on intermediate layers at each level.It can boost the back-propagation process and avoid the problem of gradient vanishing.We conduct extensive studies on multiple crowd counting datasets,including ShanghaiTech,UCFQNRF,JHU-CROWD++,NWPU-Crowd.The experimental results indicate that our proposed method has achieved superior counting accuracy and generalization ability.2.Aiming at the challenges of feature alignment and semantic imbalance caused by the multi-path fuse architectures,we propose the Multiple Refinement Fusion Network for crowd counting.This algorithm is composed of the Inter-dimensional Refinement Module and the Cross-semantic Refinement Module and a powerful baseline network.In particular,the Inter-dimensional Refinement Module adopts the triple attention mechanism to capture the cross-dimension interaction,and learns the transformation point offsets to jointly repair the serious spatial dislocation between adjacent features.Meanwhile,the Cross-semantic Refinement Module explicitly learns the semantic interdependence within features of diverse levels to relieve the semantic imbalance caused by the rough context fusion,and enhance the context description ability.Both the modules are integrated in the baseline network which is based on the top-down pyramidal architecture.Finally,the experimental results on multiple crowd counting datasets have indicated the superiority in improving the effectiveness of feature fusion and strengthening the robustness.3.The deployment of crowd counting models is largely hindered by the redundant parameters and bloated structures.Faced with the above challenge,we attempt to address this issue from the perspectives of lightweight network structure design and large network pruning,respectively.(1)We proposed a density estimation algorithm with a lightweight architecture named Ghost Attention Guided Efficient Crowd Counting Network.We adopt the Ghost Encoder Network as the backbone network to extract the crowd features,which can significantly reduce Floating Point Operations at a little cost of performance drop.By this way,the Cross-order Ghost Attention Module is specifically designed to capture the discriminative information and the parameter-sharing mechanism ensures the efficient computation.Meanwhile,we exploit the Weight-sharing Mask Density Producer to generate multi-scale density maps and foreground-background masks,relieving the complex background interference.(2)For the two crowd counting networks proposed above,we introduce a channel-level pruning framework and sparsity training strategy to output thin models with competitive accuracy and speed up the inference.(3)Extensive experiments are conducted on multiple crowd counting datasets to verify the effectiveness of the above two methods.We also compare their advantages and disadvantages from many aspects,and elaborate their application scenarios in detail.
Keywords/Search Tags:Computer vision, Crowd counting, Attention mechanism, Feature fusion, Lightweight architecture design, Model compression
PDF Full Text Request
Related items