Font Size: a A A

Detection And Tracking Based On Improving Feature Representation

Posted on:2022-08-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:T GongFull Text:PDF
GTID:1488306323982449Subject:Cyberspace security
Abstract/Summary:PDF Full Text Request
In recent years,cyberspace has been closely connected with people' s lives just like the real world.Therefore,how to maintain the security of cyberspace has become an urgent problem to be solved,which also means that various information on the network need to be monitored and filtered in real time.In cyberspace,information is transmitted through various medias,such as images,videos,and sounds.Therefore,related tech-nologies in the field of computer vision are widely used in the security of cyberspace.For example,intelligently filtering images and videos related to pornography,terror-ism,and drugs on the Internet.In these fields,since the sensitive area may be small,it is necessary to identify,locate,and track objects in the images and videos,so as to facil-itate the image and video analysis task.Therefore,the object detection and multi-object tracking tasks have important research value.In the real world,the actual situation is very complex.This mainly shows in:(1)There are complex scenes,illumination variation,many different objects and objects categories in images.(2)There are different scales and shapes of objects and occlusions among objects in images.(3)Objects are difficult to recognize due to motion blur of objects and out of focus of the lens in videos.(4)It is difficult to distinguish the same object due to the existing of similar objects in videos;These problems bring a lot of challenges to the object detection and multi-object tracking algorithms:(1)The object detection algorithms cann't extract good enough features for images,due to the fact of the complex scenes in images.(2)The object detection algorithms cann't extract good enough features for objects,due to the fact of different scales and shapes of objects and occlusions among objects in images.(3)The video object detection algorithms cann't extract good enough features for objects,due to the fact of motion blur and out of focus in videos.(4)The multi-object tracking algorithms cann't associate the same object instance,due to the fact of objects features within the same category are highly similar.In response to the above challenges,this thesis has conducted in-depth research.The main works and contributions are as follows:1.A new object detection algorithm,which is based on the multi-label classifica-tion task of images and trys to extract better features of images under complex scenes,is proposed.A new branch,which focuses on solving the multi-label classification task,is added to the object detection algorithm based on convolutional neural network.Since attention mechanism is used to generate a multi-label classification feature map,the features between object region and background region are more discriminative in the feature map.The feature map may be helpful to object detection.Therefore,ROI pool-ing is used and gate module is designed to fusing the multi-label classification features into object detection features in order to improve the accuracy of object detection.Ex-tensive experiments show that the proposed algorithm can improve the performance of existing object detection algorithm on the publicly available datasets.2.A pyramid sub-region sensitive network for object detection,which trys to ex-tract better features of objects,is proposed.Since the ROI pooling can only capture the global information of objects,and the PS ROI pooling can only extract the local features of objects,pyramid sub-region sensitive ROI pooling,which can capture both the global and coarse-to-fine local features of objects,is proposed.Given the global and coarse-to-fine local features may play a different role on the classification and re-gression of objects,self-adapting learning factors are proposed to weight the features.Extensive experiments show that,compared to other object detection algorithms,the proposed algorithm can achieve comparable or better performance on the publicly avail-able datasets.3.A temporal ROI Align method is proposed in order to extract the temporal fea-tures of the object to solve the problem of how to better extract the object features in the video,when there is object motion blur and the lens is out of focus in videos.Aiming at solving the problem that the existing ROI Align methods for proposals feature extraction can only extract object features from the current frame,Temporal ROI Align is proposed to extract temporal features of proposals.Since objects will appear in multiple frames in a video,Temporal ROI Align can extract features for proposals of current frame from features of other frames.Considering that the object may be clear in some frames and blurry in other frames,Temporal ROI Align uses Temporal Attentional Feature Aggre-gation to aggregate these ROI features extracted from current frame and other frames.Extensive experiments show that the proposed Temporal ROI Align can improve the performance of various video object detection algorithms on public datasets.4.An integrating single shot tracking with identity-aware feature aggregation al-gorithm for multi-object tracking is proposed to solve the problem of how to improve the feature similarity of the same object in the video,when the object features of the same category in the video are highly similar.Aiming at solving the problem that the features of the same object extracted by the existing similarity model do not have high similarity,a module,which aggregates the features of the same object,is proposed.This module can guide the network to aggregate the features of the same object in adjacent frames to make the features of same object more similar,making it easier for the network to as-sociate the same object in adjacent frames together.At the same time,considering that the existing motion model has the problem of a sharp drop in the speed of the algorithm as the number of objects in the video increasing,Single Shot Tracking method,which makes the speed of the multi-object tracking algorithm not be affected by the number of objects in the video,is proposed.The experimental results show that,compared with most of the state of the art multi-object tracking algorithms,the proposed algorithm can achieve better performance on the public datasets,while maintaining a faster running speed.
Keywords/Search Tags:Deep Learning, Object Detection, Video Object Detection, Multi-Object Tracking
PDF Full Text Request
Related items