Font Size: a A A

Research On RGB-D Object Recognition Via Feature Learning

Posted on:2017-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2348330503989776Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Recently, with the emergence of low-cost depth imaging sensors(e.g., Microsoft Kinect), RGB and depth(RGB-D) image can be captured simultaneously in a cheap way. Compared to the standard 2D image-based paradigm, the introduction of depth information indeed imports extra descriptive cues(e.g., surface geometry) for object characterization. It leads to a research topic that how to leverage object categorization by using the RGB-D information well, compared to the RGB-based paradigm. The reason is intuitive that, RGB image cannot capture the fine 3D object geometric properties. While, the introduction of depth information keeps the potentiality to overcome this problem. More descriptive object characterization feature can be consequently extracted. Hence, the performance improvement is expected. So it is meaningful to develop powerful and representative features for RGB-D images.In order to effectively utilize the 3D geometry information in conjunction with the 2D color structure, extracting expressive and comprehensive features from RGB-D images, we start to research more efficient and robust RGB-D object recognition algorithms. In this paper, two novel feature learning based algorithms have been developed and applied to RGB-D object recognition tasks. We have achieved very promising performance on the large scale RGB-D dataset. The main contents are as follows:First, a novel hybrid RGB-D object categorization model is proposed. It is fruited simultaneously from two state-of-the-art image representation technologies: Convolutional Neural Network(CNN) and Fisher Vector(FV). Specifically, the objects are characterized by CNN RGB domain. While, CNN is not applied to depth domain, due to the lack of sufficient samples for training. We propose to extract the corresponding depth representation via FV with the densely sampled HONV descriptors. The CNN and FV description are then fused to form the RGB-D object signature. SVM is employed for decision. The experiments on a large-RGB-D dataset demonstrate that, our hybrid RGB-D object recognition model outperforms the state-of-the-art approaches by large margins(at least $6.3%).Second, we consider the RGB-D image feature representation in the perspective of multi-view feature representation and propose a novel unsupervised feature learning framework named Convolutional Matching Pursuit(CMP) to improve depth object recognition. In particular, the raw RGB data is transformed into three specific geometric types(RGB color space, HSV color space color gradients). Since the normal can more precisely describe the orientation of local surfaces of 3D objects than raw depth data, we transform the raw depth data into the normal. To make a better use of the normal information, we also compute the gradient and coordinate angle(the azimuthal and zenith angle). Then we see RGB color space, HSV color space color gradients, the normal, the gradient and coordinate angles as multi-view information. CMP learns the feature representation of each view respectively under unsupervised setting. The final rich features for the depth are the concatenation of all six different views' representations. Experiments on various datasets show that the features learned from the multi-view information with our CMP can achieve very promising performance using linear support vector machines.
Keywords/Search Tags:Object recognition, Deep learning, Fisher vector encoding, Sparse coding, Multi-view learning
PDF Full Text Request
Related items