| The development of technology and people’s attention to the safety of public areas have led to the large-scale use of surveillance cameras in our lives.However,it is difficult for people to quickly identify and extract the required target and its location information from a large number of video images in the monitor.In view of this,for the purpose of digitizing and intelligentizing video surveillance,this paper mainly adopts theoretical methods such as computer vision and deep learning to realize target detection,tracking and target positioning based on monocular camera.The main research contents and conclusions are as follows:(1)Target detection and tracking based on deep learning methods.Person detection is achieved through the YOLOv5 target detection model.With DeepSORT,the identified target is tracked to obtain pixel coordinate information of the continuous position of the target.In order to improve the tracking performance of DeepSORT,the object detector YOLOv5 is improved.On the one hand,it improves the loss function and replaces the original GIOU Loss with CIOU Loss.On the other hand,the attention mechanism CBAM is added to the YOLOv5 Backbone to improve the detection accuracy.(2)Establish a coordinate mapping positioning model.Use the method of camera calibration and auxiliary points to obtain the internal and external parameters of the camera.A 3D point cloud model is established for the monitoring area to digitize the scene of the monitoring area.Based on the theoretical method of computer vision,a coordinate mapping positioning model is established,so that the pixel information in the image can be converted into 3D point cloud position information.To verify the accuracy of the coordinate mapping positioning model,the root mean square error(RMSE)of the sample points is 21.30,and the mean absolute error(MAE)is 6.48cm.(3)Target localization experiment based on monocular camera.The pixel coordinate information obtained by target detection and tracking is combined with the mapping model constructed in the monitoring area to calculate the position of the target in the 3D point cloud,so as to realize multi-person positioning under the monocular camera.The average mean precision(mAP)of the improved YOLOv5 target detection method reaches 0.718,which is 4.5%higher than that before the improvement.The RMSE of the results of the conducted multi-person localization experiments is 18.87 and the MAE is 16.33 cm.The experimental results prove that the algorithm based on monocular vision proposed in this paper can realize multi-person localization.This paper compares the algorithm in this paper with the traditional visual positioning algorithm.Compared with the traditional visual positioning algorithm,the RMSE of the algorithm in this paper is reduced by 9.8%,the MAE is reduced by 15%,and the running speed is 14 times that of the traditional algorithm. |