Font Size: a A A

3D Object Representation And Detection In Complex Scene

Posted on:2020-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J KangFull Text:PDF
GTID:2428330596976598Subject:Engineering
Abstract/Summary:PDF Full Text Request
3D object detection and pose estimation have important research significance for many applications such as robots,autonomous and augmented reality.3D object detection refers to the recognition of the 3D location and orientation of objects,which provides necessary information of objects for the intelligent operation of robots.However,due to the diversity of objects in the real world,target object needs to use reasonable expression method to meet the real-time and accuracy requirements of the detection algorithm.At the same time,3D object detection is affected by scene clutter and occlusion,that makes3 D object detection very challenging.The main contribution of this paper are as follows:In view of different keypoint selection methods in the present 3D object detection,we conduct ablation studies using end-to-end one-stage regression network,compareing the 3D Bounding Box,FPS and the minimum boundary ball presented in this paper,and choose the suitable object expression way for the subsequent network model training.In order to solve the problem of poor pose estimation accuracy caused by the deformation of 3D bounding box after image projection,this paper proposes a network ER-6DYOLO based on the constraint loss of the edge length of the bounding box.Based on the prior information that the 3D Bounding Box is a cube,we design a new function called Edge Restrain Loss,for the parallel sides of the prediction bounding box,which can effectively overcome the defect of the length difference of parallel Edge of 3D bounding box feature points after image projection and improve the precision of pose detection.By introducing normalization process for the loss function,the problem of length change of the prediction bounding box is overcomed and the network convergence is also accelerated.In this paper,the average 3D distance index(ADD)reachs 60% in the LINEMOD public dataset,and testing rate is 80 frames per second.ER-6DYOLO network ranks first among algorithms based on real image input.Aiming at the defect of predicting the keypoint offset to the occlusion object in the occlusion scene,this paper proposes the AttLoss loss function,so that the bounding box responsible for predicting the same object is gathered as densely as possible around the label value,guiding the network to learn the characteristics of the unoccluded part.To some extent,the problem of false detection has been solved.Experiments show that by introducing the AttLoss loss function,the 2D projection metric index is increased by 13.61%on the Occlusion dataset compared with SingleshotPose network.Based on the research of the small sample dataset and the lack of real pose label,this paper introduces the object contour as privileged information and designs a new network framework called 6DPose-PCNet.This paper introduces the contour prediction branch in the network,and guides the underlying feature to learn contour information of object through the upsampling and underlying feature fusion,which provides stronger feature information for feature point detection.Our contour prediction branch is designed as simple as possible for reducing parameters of contour prediction branch.The testing rate is72 frames per second at 3 times the speed of start-of-the-art algorithm PVNet.Experiments show that the 2D projection and ADD metric of 6DPose-PCNet achieve 93.97%and 64.19% on the LINEMOD dataset respectively,which is 2.86% and 4.19% higher than ER-6DYOLO.
Keywords/Search Tags:pose prediction, convolutional neural network, privileged information, 3D object representation, complex scene
PDF Full Text Request
Related items