Font Size: a A A

Research On Video Object Detection Algorithm Based On Deep Learning

Posted on:2022-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:H GaoFull Text:PDF
GTID:2518306524992539Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The task of object detection is to identify and locate the object in the image.Research on deep learning-based image object detection algorithms has made great progress,and classical algorithms such as Fasters R-CNN(Region-based Convolution Neural Networks),YOLO(You Only Look Once)and SSD(Single Shot Multi Box Detector)have been proposed successively.With the explosive growth of video data in recent years,more and more researchers have shifted their attention from the field of image object detection to that of video object detection.There is semantic information of context in video sequence,and there is great similarity between adjacent images and more redundant information.Compared with static images,video images are prone to motion blur,occlusion and strange posture.At present,there are two research directions: speeding up the speed of video object detection through the redundancy between frames,and improving the accuracy of video object detection through the relevance of time series.In this thesis,using the image object detection YOLOv5 network structure as the basis of video object detection network,a subset of the Image Net VID dataset as a dataset for this article.Based on the great similarity and time sequence correlation between adjacent frames of video data,the detection speed is improved by selecting key frames and frame-level propagation,and the detection accuracy is improved by using the global semantic information of video data and memory module.The main work of this thesis is described as follows:(1)Propose a video object detection algorithm based on YOLOV5.Firstly,sparse key frames are selected at fixed intervals,and the key frames are input into the convolutional neural network to extract the feature map to obtain the detection results of key frames.Then,the post-processing method of Seq-NMS(Sequence Non-max Suppression)is used to suppress the redundant bounding boxes generated on the key frames.Finally,the detection results of non-key frames are obtained by means of framelevel propagation.Extensive experiments are conducted on a subset of the Image Net VID dataset,the current video detector achieves 81.7% m AP(mean Average Precision),and the offline running speed is 84.2FPS,basically achieving the real-time effect.(2)A video moving object detection algorithm based on memory module is proposed.Firstly,the improved two-frame difference method is used to describe the motion information between the specified images,and the class intersection ratio is used to obtain the motion information ratio by dividing the number of intersection points of pixels with pixel value 1 in the obtained two binary mask maps by the total number of union sets,so as to select the key frames adaptively.Then the memory module is introduced and the global semantic information is used to enhance the results of weak detection.Finally,CIo U(Complete Intersection over Union)is used to calculate the regression loss of the bounding box to obtain more accurate position information.Extensive experiments are conducted on a subset of the Image Net VID dataset,the current video detector achieves82.4% m AP,and an off-line processing speed of 98.4FPS,or 10.2ms per image.In conclusion,the generated detector has fewer parameters,faster detection speed and more accurate positioning of the bounding box than the classical DFF(Deep Feature Flow)and FGFA(Flow-Guided Feature Aggregation)detectors.
Keywords/Search Tags:Video Object Detection, ImageNet VID dataset, frame-level propagation, keyframe selection
PDF Full Text Request
Related items