Font Size: a A A

Object Detection For Video Sequence And Point Clouds

Posted on:2022-04-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J DengFull Text:PDF
GTID:1488306323962809Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Object detection is one of the basic topics in computer vision,which is widely adopted in security monitoring,autonomous driving and intelligent robots.The goal of object detection is to localize objects appearing in the scene,and to predict their categories.The advances in deep learning have pushed the limits and improved the state-of-the-art technologies of object detection for images.However,the application of object detection technologies in real-world scenarios is faced with the extension both in spatial and temporal dimensions.On one hand,the image signals to be processed in our daily life are usually acquired in sequences,and the extension of applications from still images to video sequences has lead to the task of object detection for videos.On the other hand,in the spatial dimension,processing 2D images does not satisfy the need for intelligent systems to precisely locate objects in real-world 3D spaces.As an emerging data modal with precise distance sensing,LiDAR point clouds have been widely adopted in 3D object detection.Expanding from 2D images to 3D point clouds leads to the task of point cloud object detection.In this paper,we focus on the investigation of object detection for video sequences and point clouds.The main contribution of this paper is summarized as follows:·We propose a video object detection method based on cross-frame relationship modeling among the region proposals.Video object detection methods following the feature aggregation paradigm usually establish the explicit correspondence between the pixels among adjacent frames and leverage the pixel correspondence as guidance to perform motion compensation and feature augmentation.How-ever,the pixel-wise correspondence is not robust against the frames with motion blur,video defocus and object part occlusion,thus deteriorating the effectiveness of these methods.To alleviate these problems,we propose a novel video ob-ject detection method based on the relationship modeling of multi-frame region proposals,which relies on the attention mechanism to capture the spatio-temporal context through a multi-stage cascade architecture,thus aggregating features from the current region proposal to be detected and improve the precision for video ob-ject detection.·We propose a point cloud object detection method based on voxel region features.Object detection for point clouds heavily rely on how the point clouds are rep-resented.The common representation for point clouds are either point-based or voxel-based.In contrast to the intuition that precise positioning of raw points is essential for high performance 3D object detection,we find that the coarse voxel granularity can offer sufficient detection accuracy.In this paper,we introduce a neat yet effective framework,named Voxel R-CNN,to perform object detection following a two-stage pipeline.We devise a voxel RoI pooling operation that takes full advantage of the memory locality property of voxel representation for proposal feature extraction.The extracted region features are further leveraged to perform proposal refinement.Our proposed method strikes a careful balance between accuracy and efficiency.·We propose a point cloud object detection method based on hallucinated voxel representation from multi-view features.The perspective view and bird-eye view of LiDAR point clouds are naturally complementary to each other.Specifically,the perspective views of the point clouds are densely distributed with clearly rec-ognizable semantic information,which is beneficial to object classification.In the bird-eye view,the scale of objects does not change with the distance from the sensor,and there is no overlap between objects,which is beneficial to object lo-calization.In this paper,we present a novel architecture that novelly hallucinates 3D representation from the features of perspective view and bird-eye view,and extend the model in the second research content to propose a pseudo-3D Voxel R-CNN.In this framework,the extraction of voxel features no longer depends on the 3D convolutional backbone,further improving the efficiency of the algorithm.
Keywords/Search Tags:Object Detection, Video Analysis, Point Clouds Processing, Convolutional Neural Network, Relationship Modeling, Voxel-based Representation, Multi-view Projection
PDF Full Text Request
Related items