| Object detection refers to the selection and identification of objects from input information such as images or videos.Object detection is the fusion point of machine vision,neural network and artificial intelligence.It has broad application prospects in image retrieval,video surveillance,unmanned aerial vehicles,and unmanned driving.With the application of deep learning in the field of object detection,real-time object detection based on deep learning has developed rapidly.For example,R-CNN series algorithms,SSDs,and YOLO series algorithms have promoted the rapid development of object detection.Since the object in the real environment is affected by many factors,the real-time object detection is still very challenging:(1)There are many interference factors in the real environment: objective factors such as rain and fog,and the object rotation,scaling,and occlusion are all detected.There are many interferences and influences,so how does the detection process eliminate the impact of environmental factors on the object;(2)The image in the scene is detected continuously in a real-time change,how to make the detection system's detection speed meet the real-time requirements,and how to improve the object detection speed;Improvement,such as the differentiation of the types of the same object,the model needs to be accurately distinguished,the detection system needs to accurately obtain the category information and position information of the object at the same time.While ensuring the detection speed,the detection accuracy of the detection system also becomes Important.Based on the above problems in the real-time object detection scenario,this thesis combines the YOLO series algorithms and puts forward a unified real-time object detection model,which is mainly divided into the following contents:(1)The detection model can directly return from the input image to the score of the object category.The location of the object.Although the position of the object changes continuously in a real-time scene,the image can still be processed independently.Its single network architecture processes images at 45 fps on the PASCAL VOC2007 dataset,with good detection accuracy and detection speed;(2)Combining memory-mapped information for inter-frame information,using M-frame memory in real-time scenes to the previous M-1 frame detection,preserves the rich inter-frame inter-frame information in the video.Attaching the memory-mapped part to the last layer of the original network does not affect the entire network structure,which helps the detection model to be more suitable for the detection of the object in the real-time video stream;(3)To eliminate the influence of environmental factors in the model The video defogging algorithm module is added,and the image dehazing method based on the dark channel prior is adopted to enhance the definition of the input image in the network and reduce the influence of interference factors in the real environment on the object,thereby improving the accuracy of the detection model.degree.The experiments in this thesis were first pre-trained in the ImageNet data set and then tested on the KITTI data set dedicated to autopilot.This thesis analyzes the performance of the test model from several aspects,and at the same time,it conducts comparative experiments in multiple scenarios.KITTI is the main data set of this thesis.In order to broaden the scope of application of the model,a corresponding test experiment was performed on three other datasets: Pascal VOC 2007/2012,Road Sign landmark dataset and FDDB face detection dataset.The experimental results show that several types of data sets of the detection model have good test results,and that the detection accuracy and detection speed can be flexibly weighed by changing the model. |