Font Size: a A A

Research On RGB-D Object Recognition Based Deep Learning Algorithms

Posted on:2016-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:L F LvFull Text:PDF
GTID:2308330476952165Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Deep learning is a new field in machine learning. By constructing a multi-layer neural networks, the deep learning networks hope to mimic the human brain mechanisms for analysis and interpretation data, such as images, audio, and text.Deep learning networks form high-level abstract features by combine low-level fundamental features, hoping to find out the distributed features of data. Image recognition is one of the most important and difficult problems in computer vision filed. Increasing the image recognition accuracy has decisive significance for the popularity of autonomous robots. The successful application of depth learning in the image recognition field further promote the development of computer vision.The RGB image and gray-scale image based image recognition have made some achievements in the past few years. But due to the limitation of the RGB image and the gray-scale image which contains less useful information for classification, the most of the precious works on image recognition based on these two types of image is difficult to meet the current high requirements for accuracy in image recognition task.The RGB-D cameras(such as the Microsoft Kinect) which use the new sensing technology can simultaneously record high-resolution RGB image and depth image.RGB image contains surface color information and texture information of the object in the image. Deep image contains the space shape information of the object in the image. RGB image and depth image is an effective complement to each other. Using deep learning technology to effectively combine the RGB and depth image to improve the object recognition accuracy becomes a new hot areaofresearch in the deep learning field.In this paper, firstly, we proposed a deep learning model by combine the k sparse auto encoder algorithm and spatial pyramid max pooling algorithm. We use the k sparse auto encoder algorithm extract low-level features and then send them to the spatial pyramid max pooling algorithm, to extract high-level abstract features.Experimental results show that this algorithm have extracted discriminately features and improved the RGB-D based object recognition accuracy.Then, basing on the sparse auto encoder algorithm, we proposed a improved multi-model sparse auto encoder algorithm and a new deep learning model. The newly multi-model sparse auto encoder algorithm can effectively fuse the RGB features and depth features in the raw image layer. The experimental results show thateffective fusion of RGB features and depth features is better than simple connect them to take full advantage of RGB-D image. The recognition accuracy of RGB-D object has further improved.At last, we have extracted many types of features from RGB-D images and then fused these features at the decision level by a statical linear combination way. The research results show that fuse the RGB and depth image features in the decision level is a effective way to take full advantage of RGB-D image. The recognition accuracy of RGB-D object has further improved.
Keywords/Search Tags:Deep learning, RGB-D object recognition, K sparse auto encoder, Spatial pyramid max pooling, Multi-modal feature fusion
PDF Full Text Request
Related items