Along with the vigorous development of social economy and the rapid advancement of urbanization,large-scale population movements and gatherings have become commonplace,which brings serious threats to social and public security.In order to protect the safety of people’s lives and property,intelligent monitoring systems that can efficiently analyze the crowd image and issue an early warning were born under the catalysis of artificial intelligence and computer vision.As a significant component of the intelligent monitoring system,crowd counting can accurately estimate the total number and detailed distribution of crowds in the actual scene.Therefore,the crowd counting task plays an important role in many fields such as security warning,traffic dispatching,epidemic preventing and controlling,and so on.In recent years,the development of deep learning and computer hardware has led crowd counting algorithms based on convolutional neural networks to make a breakthrough in counting performance.However,due to the existence of many factors such as cluttered distribution,perspective effect and background interference,the performance of existing crowd counting algorithms in complex scenes still has a long way to go.In order to address the above problems,we conduct researches from the aspects of crowd density perception,multi-scale feature fusion and background noise suppression,and put forward targeted solutions to improve the counting performance in complex scenes.The main contributions and innovations of this dissertation are as follows:1.For the non-uniform crowd distribution,a multi-attention convolutional network is proposed for crowd counting.This algorithm firstly builds a semantic-spatial trade-off module based on the feature pyramid mechanism,which is able to promote the balance between high-level semantic information and low-level spatial details.And a lightweight pyramid segmentation attention module is introduced to capture longdistance channel dependencies and fine-grained spatial information.Then a global-local context module is built to uses global and local attention to extract fine density variation features at multiple granularities,resulting in high-quality attention maps.In turn,the density estimation network can be guided to identify pedestrian objects in different distribution states more accurately.Balancing semantic-spatial information and utilizing global-local context can enhance the adaptability of the model to density changes,so as to improve the counting performance in non-uniform distribution scenarios.2.Aiming at the scale variation and background interference,a crowd density estimation method based on adaptive fusion and multi-task learning is proposed.The algorithm first designs a adaptive module combining multi-column mode and weight learning mechanism,which is able to extract multi-scale features and learn pixel-level weights,and select the appropriate scale for each pixel in an adaptive fusion manner.Then the regression and classification tasks are combined to build a multi-task module,where the foreground-background classification task is used to assist the density map regression task to suppress the interference of background noise.Finally,additional supervision is deployed on the intermediate information to adjust the hidden layer features and optimize the learning process,further improving the quality of the estimated density map.The proposed algorithm can improve the counting performance in scenes with variable scales and complex backgrounds.3.Extensive experiments are carried out on four mainstream datasets to verify the two proposed methods.To demonstrate the excellent accuracy and robustness of the proposed methods in this dissertation,we conduct experiments on ShanghaiTech,UCF-QNRF,JHU-CROWD++and NWPU-Crowd datasets and compare the proposed methods with a variety of advanced algorithms.Then,abundant ablation experiments and parameter selection experiments are conducted to further verify the effectiveness of each substructure and select appropriate hyperparameters for the model. |