Font Size: a A A

Research On Key Techniques Of RGB-D Image Content Analysis

Posted on:2019-07-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y XuFull Text:PDF
GTID:1318330545975110Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development and popularization of commercial RGB-D sensors and the continuous growth of RGB-D data,the research for RGB-D media analyzing has been greatly promoted.RGB-D image content analysis,which comprises of a series of techniques for feature representation,semantic understanding and intelligent cognition for RGB-D images,is of great advantage to people's study,work and life,and has broad application prospects and potential economic and social value.Most existing re-searchers treat the depth images indistinguishably,and pay little attention to explore of characteristics in RGB-D images.In this thesis,starting with investigating the inherent properties of RGB-D images,we dedicate to developing effective techniques to boost several content analysis tasks including analyzing basic patterns,extracting semantic labels,and framing practical applications,where the differences and correlations be-tween RGB and depth images are fully explored.Moreover,these techniques play a key role to support analyzing,processing and applying RGB-D image data.The main work and contributions of this thesis are summarized as follows:1.A novel method for RGB-D objectness estimation based on adaptive inte-gration of RGB and depth information is proposed,which can take full advantage of multi-modal data and effectively improve the recall and stability of estimating.Objectness estimation is often hurt by the ambiguous area,especially the highly tex-tured regions in RGB images.In comparison,the depth images can provide a "clear"view of object structure,but its discriminative power rapidly decays with the distance between the object and viewer increasing.By investigating the respective advantages of RGB and depth images,we propose a RGB-D "class-agnostic" object description method for objectness estimation,which can adaptively integrate the RGB and depth cues.Based on the depth priors,object inner distractive regions can be effectively sup-pressed.Meanwhile,the object boundaries can be emphasized by the complementarily informative parts in depth and color gradient map.Experimental results demonstrate that the proposed method can effectively improve the recall and stability of objectness estimating.Furthermore,we build a stereo image dataset for objectness estimation,which can be used as a benchmark for related research.2.A novel multi-modal deep feature learning method for RGB-D objec-t detection is proposed,which allows learning modality-specific and-correlated feature representations between RGB and depth images,and the object detec-tion performance is significantly improved.Most existing RGB-D object detection methods usually treat the depth images indistinguishably,which failed to explore the correlations between these two modalities and led to sub-optimal results.Motivat-ed by the intuition that different modalities should contain not only modality-specific information but also modality-correlated information,we propose to learn correlated features that are shared between RGB and depth modalities as well as specific fea-tures that are only captured at each single modality for RGB-D object detection.The experimental results on two public benchmark datasets show that by introducing the modality-correlated feature representation,the proposed multi-modal RGB-D object-ness estimation approach and object detection approach are substantially superior to the state-of-the-art competitors.3.A novel method for RGB-D scene recognition based on image-to-image translation model is proposed,which can automatically build the relationship be-tween RGB and depth images and provides a significant performance boost over learning from scratch.Due to the limited size of RGB-D datasets,most existing RGB-D scene recognition methods usually fine-tune the depth-specific models from RGB-specific models pre-trained with large-scale labeled RGB datasets,which result-ed to biased depth features and failed to investigate the relationships between these two domains.Without using any extra data,we propose to learn image-to-image transla-tion model from scratch to generate RGB-D images.During training image genera-tion model,the underlying relationships between scene visual appearances and scene structural layouts can be automatically explored,which can provide richer feature for subsequent RGB-D scene recognition.The experimental results on two public bench-mark datasets show that without using any external data,our RGB-D scene recognition models perform comparably to the state-of-the-art methods.More importantly,our ap-proach enables the cross-modal scene recognition in a seamless manner,which will facilitate the practical applications in constrained situations.4.A novel complete framework for object-based stereo image retrieval is developed,which incorporates the RGB and depth information to automatically extract multi-salient objects that can alleviate the distracting of irrelevant back-ground regions during retrieval.With the rapidly increasing of stereo image da-ta,how to manage and access them efficiently turns out to be an urgent problem,which is just the same as digital images about two decades ago.By expanding the conventional object-based image retrieval,we introduce a complete framework for object-based stereo image retrieval.With the help of recovered depth images,we pro-pose an automatic salient object extraction method,which can not only be applied for building object-level image index,but also regarded as an online service called query recommendation.In addition,we propose a novel distributed cluster-based locality-sensitive hashing(CLSH)framework,which aims to index and search large scale high-dimensional feature data.The experiments demonstrate that our method can success-fully extract salient objects and the retrieval is more efficient and effective.We also build and open a stereo image retrieval benchmark dataset for related research.
Keywords/Search Tags:RGB-D images, Depth images, Multi-modal feature learning, Objectness estimation, Object detection, Scene recognition, Content based image retrieval, Highdimensional feature index building and searching
PDF Full Text Request
Related items