Monocular Object Perception In Autonomous Driving Scenes

Posted on:2022-06-27

Degree:Master

Type:Thesis

Country:China

Candidate:S J Luo

Full Text:PDF

GTID:2492306536487864

Subject:Electronic Science and Technology

Abstract/Summary:

PDF Full Text Request

Autonomous driving is an area that has attracted much attention in recent years.Many technologies of the autonomous driving system have made significant progress,thanks to the development of deep learning.The autonomous driving system is mainly composed of three parts:perception,planning,and control.The perception module is equivalent to the eyes of the autonomous driving system,which is responsible for perceiving the surrounding environment and providing accurate scene information for the planning and control module.Perception for autonomous driving scenarios is a challenging task because the working scenarios of autonomous vehicles are extremely complex.This paper focuses on monocular object perception in autonomous driving scenarios and proposes:1.An anchor-based single-stage monocular 3D object detectorThe goal of 3D object detection is to identify the categories of surrounding obstacles and their 3D bounding boxes,which are parameterized by size,location,and orientation.This paper proposes an anchor-based single-stage neural network with feature alignment and asymmetric non-local attention for monocular 3D object detection.A two-step feature alignment method is proposed to address the problem in the single-stage object detector that the receptive field of the feature does not match the anchor.And an asymmetrical Non-local attention block is proposed for combining environmental information and depth-wise feature extraction,which contributes to the accuracy improvement of the object depth estimation.The experimental results on the KITTI dataset show that the proposed method significantly improves the performance in both 3D object detection and bird’s eye view tasks.2.A monocular object referral method based on cross-modal transformersObject referral in the autonomous driving setting considers the situation where a passenger gives a command which may be associated with an object found in a street scene,such as “drive up to the right side of that truck in front of us.” The goal of object referral is to retrieve the corresponding object in the scene according to the natural language command.This paper proposes a framework using cross-modal transformers to tackle the joint understanding of vision and language.A convolutional neural network is adopted for visual feature extraction.And the encoder of transformers is utilized to learn linguistic features.Linguistic features and visual features are matched and aggregated through the cross-modal attention in the decoder of transformers for learning cross-modal representations.The experimental results on the Talk2 Car dataset demonstrate that the proposed method outperforms previous methods by a wide margin.

Keywords/Search Tags:

Autonomous driving, monocular image, 3D object detection, object referral, deep learning

PDF Full Text Request

Related items

1	Monocular 3D Object Detection Method For Autonomous Driving Scene Based On Deep Learning
2	Research Based On Deep Learning For Autonomous Driving Monocular Vision Object Detection Technology
3	Research On Monocular 3D Object Detection Algorithm Based On Depth Estimation
4	Research On Deep Learning-based Object Detection Algorithm For Autonomous Driving
5	Multi-Dimensional Object Detection And Tracking In Autonomous Driving Based On Monocular Vision
6	Research On 3D Point Cloud Object Detection Algorithm For Autonomous Driving
7	Research On Object Detection Based On LiDAR Point Cloud In Autonomous Driving
8	Traffic Object Perception Technology And Its Application Based On Deep Learning
9	Design Of Vision Perception Algorithm For Autonomous Driving Based On Monocular Vision
10	The Research Of Object Detection Algorithm In Traffic Scene Based On Deep Learning