| With the rapid development of artificial intelligence technology,machine learning,deep learning and other technologies have been widely used in medical,education,living transportation,commercial retail and other fields.Among them,the use of object detection for indoor population statistics has always been a hot research.At present,most of the mainstream algorithms for object detection are based on deep learning models,which can be roughly divided into classification-based and regression-based.Classification-based object detection algorithm,also known as two-stage(two-stage)model,it divides the detection problem into two steps,first selecting a candidate region proposals,and then classifying and locating the candidate region,and finally outputting the object detection results.One of the most representative is the R-CNN series of algorithms,such as R-CNN,SPP-Net,Fast R-CNN,Faster R-CNN and so on.Regression-based object detection algorithm is also known as one-stage model.This method simplifies the object detection process into a unified end-to-end regression problem,and the position and classification information of objects can be obtained through one-time image processing.Compared with the two-stage model based on region extraction,the feature sharing of one stage can be completed in a complete training,which improves the operation speed and improves the accuracy of recognition,the most representative of which is the YOLO series,SSD,etc.Due to the indoor monitoring screen there are crowds of people blocking each other,uneven lighting and blurred target characteristics and other issues,when detecting,the existing target detection algorithms often occur to detect the human portrait or some objects that are not originally portraits to detect the error of adult portraits,which often causes the recognition rate is not high,the missed detection and false detection rate is high,and the number of people is still facing great challenges.In order to solve this problem,this paper carries out a series of studies on indoor occupancy statistics based on the YOLOv3 algorithm,and the main work is as follows:(1)An indoor people counting model based on global attention is proposed,and the attention mechanism coordinate attention is introduced to improve the basic network part of the object detection algorithm YOLOv3,so as to enhance the detection ability by extracting more features of small heads or fuzzy heads.The experimental results show that the improved network model has higher recall and average accuracy.(2)In order to enable the network to obtain more feature information and enhance the detection ability of fuzzy or small objects,the feature fusion network and multi-scale detection network of YOLOv3 algorithm are improved to fully learn the shallow features,and the F-YOLOv3 model is proposed,in which 104×104 is added size feature map output and cancel 13×13.Output of dimension characteristic drawing;The feature map sampled on each layer of the original network is further up sampled,and the resulting feature map is stitched together with the corresponding size of the original network;The five convolutions in front of the output layer are transformed into one convolution and two residual units to extract more feature information and enhance the ability to detect fuzzy or small objects;Finally,an adiou loss branch is added to measure the positioning accuracy of the detection frame.The experimental results show that the F-YOLOv3 model has higher recall rate and average accuracy,and the people counting performance in indoor scenes has been significantly improved. |