| Object recognition is an important task in computer vision.It has a wide range of applications in navigation,intelligent robots,unmanned vehicles and surveillance,etc.Compared to 2D images,3D data(e.g.,point clouds,meshes or voxels)can provide more geometrical information and are insensitive to the variance of both illumination and scale.Therefore,3D data can be used to improve the performance of several tasks including 3D object detection and recognition.3D object recognition algorithms are usually developed based on human-designed 3D features and off-the-shelf classifiers(such as SVMs)to predict the labels of 3D objects.Recent advancements of inexpensive 3D sensors such as Microsoft Kinect and Google Project Tango have made 3D data more accessible and have also greatly increased the number of publicly available 3D data.It therefore,enables the deep learning methods to be used on 3D data,which is still a challenging task.In particular,Convolutional Neural Networks(CNNs)have been extensively investigated and shown state-of-the-art 3D object recognition performance.This dissertation presents an extensive theoretic and technical research on 3D object recognition using deep learning methods,especially 3D volumetric CNNs.As for the theoretical investigation on deep learning methods,a review on the advancement of applying CNNs on 3D object recognition is made first.Then a detailed summary of fundamental knowledge about CNNs is presented,from the core layers such as convolutional layers,pooling layers and fully connected layers to a variety of gradient descent based optimazation algorithms.In addition,we also illustrate several commonly used regulirazation methods.It provide several useful practical skills of training neural networks.Regarding to 3D object recognition,a lightweight 3D volumetric CNN architecture for real-time 3D object recognition,called LightNet,is proposed.Subvolume supervision and orientation prediction tasks are combined within a shallow volumetric CNN architecture to facilitate 3D object recognition.During training,our model learns 3D features for category and orientation classification from complete and partial objects.Benefiting from auxiliary training tasks,LightNet outperforms the existing single volumetric CNN models on both the ModelNet dataset and the Sydney Urban Objects dataset with the smallest number of training parameters.The proposed LightNet is further tested on the large-scale ShapeNet Core55 dataset to demonstrate its effectiveness.These resutls clearly demonstrate that our model can fit both small-scale and large-scale datasets well mitigating overfitting.Experiments also show that our model can recognize 3D objects in real time. |