Font Size: a A A

Video Object Detection Technology Based On Multi-feature Aggregation

Posted on:2021-06-04Degree:MasterType:Thesis
Country:ChinaCandidate:S G JuFull Text:PDF
GTID:2518306512487794Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Video object detection is a hot research topic in scene understanding.In recent years,as the hottest technology in the field of artificial intelligence,deep learning has been applied in the task of object detection.Image object detection is to classify the existing object in image and frame them with bounding box.Compared with image object detection,video object detection also needs to consider the relationship between frames in the video,and how to deal with the problem of blur and defocus.The object detection algorithm usually processes the result of single frame,then use the region proposal network to get the candidate region.Finally,the feature of the candidate region is classified and the border is regressed to get the result.Some current video object detection algorithms usually aggregate the feature map in spatial location when use the context information from sequence of video.This paper proposes a mechanism of using context information in the model,it can keep the useful information and filter the bad information,finally it imporves the performance of video object dection by using context information reasonable.The work of this paper mainly includes:(1)Building a multi feature aggregation model of bidirection gating structure,for the context information of the image,the gating structure is used to select and aggregate the useful features into the current feature map.We use Image VID and Image DET datasets to train the model proposed in this paper,the validity of the model is verified by the testset of Image VID.(2)In view of the phenomenon that the target moves in the spatial position in the video data,we train an optical flow network whose output matches the target detection model.The output of network is used to transform and align the feature map of adjacent frames,and then the aligned feature map is used as the input of bidirection gating structure.We also verify the effectiveness of this method.(3)We build an interface for processing video files,encapsulate the processing details and realize the object detection of video file format.Base on the interface,we building a small video detection system.By separating the feature extraction network and the multi feature aggregation prediction network as two static calculation graphs,we avoid the problem of multiple convolution and improved system operation speed.
Keywords/Search Tags:Video object dection, gating structure, optical flow, spatial transform
PDF Full Text Request
Related items