Crowd Counting Algorithms Based On Attention Convolutional Neural Network | Posted on:2020-04-23 | Degree:Doctor | Type:Dissertation | Country:China | Candidate:Y M Zhang | Full Text:PDF | GTID:1368330572487897 | Subject:Pattern Recognition and Intelligent Systems | Abstract/Summary: | PDF Full Text Request | With the fast development of the society and the urbanization,the number of public and civil surveillance increases greatly.To utilize these video data more effectively,many countries have begun to research the technology of intelligent surveillance.As one of the most important tasks of intelligent surveillance system,crowd counting has important value in theory and application and has become a research hotspot in computer vision and artificial intelligence in recent years.The crowd counting algorithms have seen significant progress with the academic and industrial researches on this issue.However,this task still faces many challenges for application.Firstly,the images or videos captured by the surveillance often contain complex backgrounds,some of which are similar to the shape of people’s heads.In this case,the backgrounds are easily to be recognized as crowds.Secondly,the freedom of the individuals in the crowd is high,and it causes non-uniform distributions and differences in density,which increases the counting difficulty.Thirdly,the head sizes in the surveillance differ greatly because of the impact of photographic distance and angle,which makes it hard to locate head regions.We research the crowd counting algorithm to address the above three challenges.The main contents and contributions can be summarized as follows:1.We designed a patch-appearance classification task to address the individual difference in complex backgrounds,which forms auxiliary learning crowd counting algorithm and relieves the problem of mis-identification of crowd targets.The contributions of this algorithm can be summarized as:(1)The counting algorithm avoids cumbersome tasks such as foreground segmentation and head detection.Image segmentation is the only required preprocessing work.(2)The proposed method assembles feature extraction and classification together for global optimization.Deep learning model is utilized to avoid manual feature extraction and regression model design.(3)The auxiliary task,which is designed based on the appearance features and the crowd counting task hard-share the network parameters.The parameter-share pattern can extract multi-contextual features,which can assist the network to focus on head locations during training and increase the counting accuracy.2.We applied attention mechanism to crowd counting to address the problem of complex backgrounds,which forms head attention-based crowd counting algorithm and filters non-head regions effectively.The contributions of this algorithm can be summarized as:(1)We modified the traditional attention mechanism and made the first attempt to apply it to crowd counting task.The combination of the attention mechanism and convolutional neural network is robust to backgrounds and can guide the network to focus on head locations and filter the non-head regions effectively.(2)We designed a relative error loss to improve the importance of the sparse crowd samples during training to increase the sparse crowd counting accuracy.3.We designed a multi-resolution attention module which combines dilated convolution and multiple attention mechanisms as well as the auxiliary training to address the problem of non-uniform distribution.The multi-resolution attention network increases the counting accuracy effectively and the contributions of this algorithm can be summarized as:(1)The network is robust to non-uniform distribution by cascading the multi-contextual features trained by the tasks of density level classification and crowd counting.(2)We modified the attention mechanism by generating attention maps from consecutive convolution layers to strengthen the features of head locations.(3)We designed a multi-resolution attention mechanism by combining the modified attention mechanism and dilated convolution.The dilated convolution operation can learn features which have large receptive field with less parameters.These features can not only provide the generation of attention maps information that is more comprehensive but also increase the counting accuracy of non-uniform distributed crowds by extracting global information.4.We modified the AlexNet by fusing feature maps from different layers for crowd counting to enhance the network’s ability of handling multi-scale objects.In addition,we designed a scale adaptive network,which can extract features with more different receptive fields to address the scale variations.The scale adaptive network can enhance the feature channels,which have appropriate receptive field size and suppress the competitiveness of weak correlation feature channels.The contributions of the scale adaptive network can be summarized as:(1)We designed a scale expansion unit which consists of a traditional convolutional branch and a dilated convolutional branch to extract multi-scale features.(2)The scale expansion units are connected densely to further expand the range and the density of the receptive fields.(3)We designed a channel-wise attention unit to selectively enhance the feature channels which have appropriate receptive field size and thus relieving the negative effective caused by the competition between different feature channels.5.In order to explore the influence of multi-modal data to the network training,we designed a modal-weighted neural network,which can adaptively increase the weights of important modals.In addition,this network is the basement of the scale adaptive network.The contributions of this network can be summarized as:(1)A structural regularization is designed and applied to auto-encoder to guide the network assign weights to different modal data.Thus,the network can learn the influence of multi-modal data to the network training and utilize multi-modal data more effectively.(2)This network is applicable to different classification tasks which have multi-modal data and the users can train it with different network hyper-parameters and multi-modal data. | Keywords/Search Tags: | Crowd counting, Convolutional neural network, Auxiliary training mechanism, Attention mechanism, Scale adaptive | PDF Full Text Request | Related items |
| |
|