Font Size: a A A

Research On Auxiliary Information Guided Modeling Of Crowd Counting

Posted on:2022-08-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y YanFull Text:PDF
GTID:1528306839478504Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Crowd Counting is one of the basic computer vision tasks.In the recent years,due to the rapid growth of urban population,crowd counting has been of increasing importance in constructing smart cities,pedestrian and vehicle surveillance.The main goal of crowd counting is estimating the count of people in an image.Normally,the annotation of crowd counting image is a dot map which marks out each head in the corresponding image.Nowadays,the state-of-the-art methods tend to generate density maps via blurring the dot map with fixed Gaussian kernels and then learn the mapping between the input and the corresponding density map.However,density map suffers from two intrinsic limitations,i.e.,scale variance and dot annotation variance.The first limitation is mainly due to large scale variance of heads in the image and the lack of head-scale information in the density map.The second one is the arbitrary location of a dot annotation in the head,which means that the dots lie in different semantic parts of the heads.To alleviate the scale variance of heads in the crowd images,we use perspective information to guide the adaptive convolution,so as to model the continuous scales and finally improve the robustness to scale variance.Besides,to tackle the annotation variance of labeling,we manage to utilize the head-box annotations to construct Sign-MSE inside the heads.As a result,the model trained with Sign-MSE performs better in tackling dot annotation variance.On the other hand,the datasets are of small data volume and the mainstream approaches aim at higher performance on a single dataset,resulting in severe overfitting.In the thesis,by considering the activation of convolutional kerels in the pretrained model to various images in the dataset,we build domain-specific kernels and adopt them as the guidance information of an attention module,so as to make the model more robust to different datasets.The main research contents and contributions are summarized as follows.(1)To tackle scale variance in crowd counting,we propose Perspective-guided Gaussian-blurring Convolution(PGC)to handle such issue.Benefited from the auxiliary information from perspective maps,PGC can explicitly perform spatially-variant Gaussian smoothing on the deep features,resulting in adaptive receptive field allocation.Finally,for those datasets that perspective maps are not available,we build a perspective estimating branch to alleviate this problem.The estimated perspective maps are used as the guidance of PGC.From the experiments and analysis,PGC can effectively alleviate the scale variance,resulting in better performance for crowd counting.(2)Although PGC can alleviate scale variation to some extent,it suffers from high computation burden and severe feature confusion.To this end,we propose PerspectiveGuided Fractional-dilation Convolution(PFC).PFC generalizes normal dilated convolution by enabling the dilation rate to be any fractional number.Comparing with PGC,PFC is computation-light,feature confusion free and higher ability of modeling continuous scales.Experiments show that PFC surpasses PGC in faster speed and higher performance.Experiments show that PFC enhances the robustness to scale variance and surpasses PGC in better performance and less runtime.(3)To tackle dot annotation variance of labeling,by exploiting the auxiliary head-box annotations,we propose Sign-MSE to improve the robustness of network to annotation variance.Besides,to alleviate the absence of head-box annotations of some datasets,we build a head-box prediction network guided by dot annotations.It is noted that we only predict head boxes for training images.Due to the availability of dot annotations of training images,the corresponding head boxes can be estimated.Expermeints show that Sign-MSE can tackle the dot annotation variance,leading to better performance in the evaluation datasets.(4)To improve the robustness of the mainstream crowd counting methods,we merge the training images of multiple datasets(observed domains)and use it to train a single model,then we compute the impact scores(i.e.,the importance of each convolutional kernels in the pre-trained model)and figure out the auxiliary domain-specific kernels.By constructing an attention module guided by the domain kernels,the trained model can perform well on the multiple datasets(observed domains).Experiments show that our method performs satisfactorily not only on the observed domains but also on unseen ones.
Keywords/Search Tags:Scene Analysis, Crowd Counting, Multi-domain Crowd Counting, Perspective Map, Dilated Convolution
PDF Full Text Request
Related items