| Worldwide,as urbanization accelerates,the number of urban dwellers increases dramatically,and large-scale crowd gathering scenes become more and more common.There are many security risks in these scenes,which can easily lead to public safety incidents.To prevent security incidents from occurring,computer vision-based video surveillance systems are widely used in various public places within cities to monitor and analyze crowd information at each place in real time.As two basic tasks of video surveillance systems,large-scale crowd counting and crowd localization play important roles in public safety field and receive wide attention from academia.With the gradual maturity of the Convolutional Neural Network technology,the existing crowd counting and localization methods have transitioned from dealing with single crowd scenes to dealing with relatively dense crowd scenes.However,in some scenes with complex environments,these existing methods still cannot achieve the best performance due to the crowd scale variation problem and the complex background problem.Therefore,the goal of this thesis is to cope with the scale variation problem and complex background problem in complex scenes,so that to achieve accurate counting and localization of large-scale crowds.Based on this goal,this thesis firstly conducts an in-depth study on crowd counting,and proposes a crowd counting model based on multi-scale attention recalibration network.After that,considering that a single crowd counting method can only analyze the crowd distribution in different regions of the crowd scene,and cannot provide the specific location information of the crowd.Thus,this thesis develops further research on crowd localization based on the study of crowd counting and proposes a crowd localization model based on multi-scale feature refinement network for the problem of smaller individuals at the edge of the scene caused by scale variation.As a result,the main innovative work of this thesis can be summarized as follows:(1)This thesis presents a Multi-scale Attention Recalibration Network for Crowd Counting,which introduces a feature enhancement module and a feature recalibration module to tackle scale variation problems and complex background problems,respectively.Firstly,the feature enhancement module performs multi-scale feature enhancement using multiple dilated convolutions to provide rich multi-scale contextual information for subsequent operations.Later,the feature recalibration module integrates dimensional attention blocks and regional recalibration blocks to further suppress the background information in these contextual features.Thereby,the dimensional attention block captures the semantic dependencies of contextual information between different dimensions,and the regional recalibration block reassigns attention weights on different regions according to the dependencies to further suppress the attention weights of background regions.By combining the above two modules,the presented crowd counting method can purposely capture crowd features in crowd images and accurately estimate crowd density.A series of experiments conducted on several publicly available crowd counting datasets show that the presented crowd counting method significantly outperforms most existing methods in terms of counting accuracy and quality of generated density maps.(2)This thesis presents a Multi-scale Feature Refinement Network for crowd localization,which uses three branches to fully extract the contextual features of crowd scenes at different scales,so that it can accurately capture the information of each individual in the crowd scene.Specifically,this method first introduces a feature perception module,which concatenates several different dilated convolutions to encode wider range of contextual information at different scales and improve the robustness of the proposed method to cope with scale variation.After that,a feature refinement module is used to dynamically facilitate the mutual refinement of contextual information between each branch,thus further improving the representation of multiscale contextual information.Through the operation of the above two modules,the crowd localization method proposed in this thesis can localize each individual in crowd scene to the maximum extent and robustly cope with various complex crowd scenes.Extensive experiments on multiple crowd localization datasets show that the presented crowd localization method significantly outperforms existing methods and has more state-of-the-art performance. |