Font Size: a A A

Multimodal Deep Learning Object Detecting And Application

Posted on:2019-02-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y LinFull Text:PDF
GTID:2348330563954052Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As an important part of the scene perception of the intelligence vehicle,the accuracy and real-time performance are the important conditions for intelligence vehicle to make precise control.The proposed approach in this paper presents a multimodal approach to meet the accuracy and real-time requirements under the premise of the completion of the lane scene detection tasks(mainly for vehicle).It achieved an improved detect effect compared with the traditional singlemodal approach.The main contribution of this paper are as follows:A combination of image and LIDAR's multimodal detection solution is proposed to solve the problem of insufficient expressive ability of scenes with only image modality.The addition of LIDAR's distance information combined with the pixel information greatly avoids the misjudgment and missed inspection of the scene detection brought by a single modality.By comparing the subjective and objective experiments of KITTI's international public database,the accuracy rate in the single-mode contrast experiment(CPU)is improved by 10% to 15%,and the single frame speed is increased by 2 times in the homogenous multi-modal algorithm(GPU)comparison experiment.A 3D bounding-box extraction based on LIDAR overlooking view ROI and image ROI fusion is proposed which can resolve the problem of 3D LIDAR point cloud is very sparse for 2D image pixel points.And if the LIDAR point cloud data is directly projected onto the image through the translation and rotation of the two coordinate,the magnitude of LIDAR point is far less than the pixels,resulting in very few available radar information while the time complexity is increased.Projecte the LIDAR point cloud onto the overlooking view,through down-sampled by the convolutional network.The RPN network proposes the 3D proposal returns the 3D frame coordinates of 8 points.The results are compared with the single modal or the previous chapter.The image-combined lidar method is more accurate,the overall accuracy is improved by 5%,and the singleframe processing speed is increased by about 3 times.Further,because of the extraction of the 3D bounding-box,it is possible to provide more 3D position information and orientation information about the detected object,which is also a direction for future work.The multi-modal deep learning application for structured road scenarios starts from the real-time and accuracy of unmanned vehicle target detection.Based on the previous two research contents,it is theoretically based on structural roads and the characteristics of scenes are modified to abandon discomfort.A multi-modal deep learning obstacle detection system with real-time and detected obstacle accuracy was proposed,based on pascal VOC2007 and Changchun FAW redflag vehicle collection data as an offline learning database,at the Changchun Nong'an Proving Ground and the Ring Road section.Low-speed,high-speed online testing.Compare with other obstacle detection systems in front of vehicles.
Keywords/Search Tags:multimodel, depth information, overlooking view, 3D bounding-box, Structured road
PDF Full Text Request
Related items