With the development of urbanization and the prosperous development of economy in our country,the scale of urban population expands rapidly,and the holding of various cultural and sports activities are more frequent,resulting in frequent large-scale crowds.In these extremely dense crowds,there is so much shoving and jostling,which is liable to safety accident,resulting in heavy casualties and property damage.To avoid such situations,it is necessary to keep the crowd size within a reasonable scope by conducting timely monitoring of the human flow.As a result,the crowd counting method which can automatically predict the number and density distribution of crowds in complex scenes has attracted increasing scholars’ interest.Despite the significant progress of the deep learning-based counting methods,there are still many factors impeding the further development of crowd counting,such as crowd scale variation,uneven density distribution and background noise interference.To this end,this thesis conducts in-depth research on how to achieve more accurate crowd counting in complex scenes and proposes two effective solutions.The main work and contributions of this thesis are summarized as follows.(1)In response to the continuous scale variation and background noise interference,this thesis proposes a hierarchical scale-aware encoder-decoder network for congested crowd counting.The algorithm takes rich multi-scale feature encoding and multi-path feature adaptive fusion as the core,and filters crowd features from different dimensions,which achieves the adaptability to scale variation and the robustness to background noise.Specifically,the scale-aware encoding network employs cascaded encoding branches to encode multi-scale crowd features hierarchically and continuously performs multi-level information fusion in each branch to ensure the continuity of scale information.During the encoding phase,the multi-dimensional adaptive weight generators are introduced to adjust the attention of network on different regions through different weight parameters to suppress background noise and redundant information in the features.Finally,the multi-path aggregation decoding network adopting the multi-path feature decoding strategy introduces the spatial and channel guidance modules to adaptively aggregate the rich feature representations and selectively emphasizes the more appropriate feature information,further enhancing the robustness of the algorithm.(2)After solving the scale variation and noise interference,this thesis further proposes a multi-context collaborative scale-aware aggregation network to address the uneven density distribution and occlusion in congested crowd.The algorithm not only encodes rich crowd feature representations,but also models local contextual information of multiple receptive fields and dependencies between global features,which further improves the accuracy of identifying multiple crowd patterns and boosts the performance in complex scenes.Specifically,the algorithm constructs multiple multi-resolution feature encoding stages to encode multi-scale crowds progressively,and deploys information filtering modules repeatedly to continuously suppress the interference information in crowd features.To better identify multiple crowd patterns in complex scenes and reduce local counting errors,a local feature enhancement module is proposed,which enhances the consistency of crowd features in each local area by capturing the context in various local receptive fields.Besides,a global semantic modeling module is also introduced to compensate for the limitations of convolution by establishing the long-range dependencies between global features,enhancing the perception of global information and further boosting the performance in complex scenes.Extensive comparison and ablation experiments are performed on four challenging crowd counting datasets to verify the network performance and the effect of each module.The experimental results well prove the effectiveness and advance of the proposed algorithms. |