| Intellectualization has become an indispensable part of the construction of scenic spots.Security is critical for building intelligent scenic spots.With the increase of people’s security demand and the sharp expansion of monitoring data,the manual supervision of video can no longer meet the actual needs.Its problems mainly reflect three aspects: the first is the monitoring failure caused by staff fatigue;Second,it cannot meet the requirements of real-time query and takes too long to query afterward.Third,it cannot provide real-time alarms and prevent abnormal events.With the wide application of deep learning technology,the crowd density estimation algorithm based on a convolution neural network has achieved excellent performance.However,in the actual scenic scene,the crowd scale changes and uneven distribution caused by perspective significantly affect the accuracy of the crowd density estimation algorithm.In addition,the change of illumination,crowd occlusion,and complex background in the scenic spot also strain the traditional detection algorithm for abnormal behavior of people.To solve the above problems,this paper presents an effective multi-column crowd counting method,aiming at the crowd scale changes and uneven distribution.For a specific scenic area video,a method for estimating video crowd density based on multi-source feature fusion is presented.We present a real-time crowd anomaly detection algorithm based on density estimation using different energy information.The specific work of this paper is as follows:1.Scale-adaptive crowd counting method based on attention feature fusion mechanism.An attention-based multi-column feature fusion method is proposed to solve the problem of homogeneity of standard multi-column network features,enabling each column to focus on different scale areas of the picture.First,a feature pyramid is built to fuse the features as the backbone so that the basic texture features and advanced semantic features of the crowd can be obtained at all stages of the network.In each column of the network,the dilation convolution kernels with different dilation rates are used to make each column have multi-scale perception fields to match the feature of different input scales.Finally,spatial attention weight constraints are applied to the attributes of each column of the network.Which enables each column of the network to focus on different spatial locations,solving the feature homogenization in traditional multi-column networks and improving the counting accuracy.The method is tested on the public scenic area datasets Shanghai Tech A,Shanghai Tech B,and World Expo’10.The experimental results show that the scheme can avoid the feature homogenization of multicolumn networks and reduce the MAE to 64.2,8.1,and 8.3,respectively.2.Multi-source video crowd counting method based on perspective guide deformable convolution.This chapter uses multi-source features to improve the accuracy of crowd counting based on the monitoring scenic.For this scenario,the corresponding perspective is built and used to guide the deformable convolution’s dilation rate,improving the adaptability in video counting tasks.Considering the real-time video monitoring requirements,the trimmed VGG network is used as the backbone network.Because the situation of two consecutive frames in the surveillance video does not change much,the backbone with shared weights is used to extract and fuse the features of two consecutive frames to improve the network’s feature expression ability.Furthermore,a perspective is constructed independently for each monitoring scene,and a variable dilation convolution module is constructed.The perspective is used to guide the convolution and expansion rate of different image areas so that the network can adapt to the scale changes in different areas of the scene.At the same time,a high-quality crowd density map is generated while improving the counting accuracy.The method is tested on the open scenic area video datasets Venice and World Expo’10.The results show that the scheme can use the multi-source characteristics,reducing the MAE to 17.9 and 7.0,respectively.3.Abnormal behavior detection method based on density map and perspective perception.To detect abnormal behavior,we present a crowd density map as a high-semantic description and then develop a method to detect abnormal behavior in a crowd.Considering the difficulty in extracting the features of the moving crowd,this chapter uses the density map generated above to describe the moving crowd as a high-order semantic feature.Then the density map sequence is transformed into the crowd energy features.According to the scene’s perspective,the crowd moved in different areas is given the weight of the energy to improve the accuracy.Then,frame difference energy is calculated according to the energy characteristics in the video sequence.Finally,the conversion of crowd density series data to time series data of motion characteristics is achieved according to the rate of change of frame difference energy.Based on the high statistical stability of the transformed time series data of typical crowd characteristics,the three-sigma rule is introduced to determine the upper and lower bounds of standard energy information and detect abnormal crowd behavior.The method is tested on the public population anomalous behavior detection dataset and achieves an average accuracy of 97.68%.It has some advantages over traditional methods of extracting population characteristic information. |