| Estimating the number of objects in an image is a challenging yet rewarding task and has been used in many applications such as urban planning and public safety.Among various object counting tasks,crowd counting is particularly prominent because of its special significance in promoting social security and development.For example,in recent years,the COVID-19 has spread around the world and stampede accidents have occurred from time to time.The use of crowd counting technology can effectively prevent such accidents.However,the reality is complex and diverse,such as background clutter and scale changes,which affect the accuracy of crowd counting algorithms.To address these challenges,this paper proposes the following two different crowd counting algorithms.(1)Due to perspective distortion,the scale of the head will change due to the difference in position,the head is larger in the vicinity and smaller in the distance.The perspective information encodes the distance between the camera and the scene,and embedding it into the network model can handle scale changes more efficiently.This paper proposes perspectiveguided point supervision network(PPSNet)to embed perspective information into the pointsupervised network.First,this paper constructs a perspective attention method with perspective information,which can focus on key features and ignore unimportant features in both space and channel dimensions.Second,the perspective-guided fusion module is proposed to address scale variation and background clutter,which combines multi-scale features with perspective attention.(2)In recent years,density map-based methods have been widely used because of its remarkable effect.However,they do not always generate accurate high-quality crowd density maps due to occlusion,scale variation,etc.Therefore,this paper proposes a point-supervised scale adaptive crowd density estimation algorithm(Point-supervised Scale Adaptive Network,PSANet),which directly uses the ground-truth points as the training targets and avoids generating inaccurate crowd density maps.However,since the output values of PSANet are continuous while the ground-truth points are discrete,this paper proposes an optimal matching loss that builds a matching transfer matrix between the output values and ground-truth points to ensure that each ground-truth point be matched to a unique pixel from output map.To further address the scale problem,this paper proposes a scale-adaptive convolution algorithm,which utilizes different scaling convolutions to extract multi-scale features,and then utilizes an attention mechanism for fusion. |