Since the COVID-19 outbreak in late 2019,the proper wearing of masks has become an important step to stop the human-to-human transmission of the COVID-19 virus.With the success of epidemic control,the public’s awareness of wearing masks has been relaxed.However,manual supervision of people entering public places is not only a waste of human resources and inefficient,but also a risk of cross-infection.Therefore,it is very necessary to automate the analysis of surveillance videos already popular in public places by relying on the target detection algorithm based on deep learning,and to continuously and effectively conduct real-time detection on whether people wear masks correctly after entering public places.However,compared with other target detection tasks,the target size of whether pedestrians wear masks or not is very small when there is a large human flow in the video,the target information carried in the corresponding area is very limited,and the background in the video is relatively complex.How to realize the detection of small-size targets in these complex backgrounds is a difficult problem.In view of the above problems,this paper improved the anchor size in the Faster RCNN algorithm and conducted multi-scale feature fusion to make it better detect in crowded places.The main studies are as follows:(1)By comparing the advantages and disadvantages of several typical con volutional neural networks,it is decided to adopt RES-Net50 as the trunk netw ork of Faster RCNN for feature extraction of objects in images.(2)For the data set used in this paper,the size of anchor in the Faster RCNN algorithm is adjusted by means of k-means clustering to make it better adapt to the target size.(3)For the problem of small target detection,the multi-scale feature fusion method is used to improve the resolution and semantic information of the target in the feature graph,so as to improve the detection rate of small target.Experimental results show that compared with the basic Faster RCNN algorithm,the proposed algorithm improves 17.96% on m AP,and can achieve higher detection rate and classification accuracy in the detection and recognition of targets with smaller size and resolution in images.Moreover,when applied to video detection,it has achieved the expected effect of small and medium size target recognition in the video,and can effectively supervise whether pedestrians wear masks correctly. |