Font Size: a A A

Research On Multi-view Feature Learning For 3D Model Retrieval

Posted on:2020-01-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:D WangFull Text:PDF
GTID:1368330614950674Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
3D model retrieval is widely used in the fields of computer aided design,robotic manipulation,augmented reality,etc.This dissertation aims at developing novel methods to retrieve relevant models from 3D database with high accuracy and efficiency.Unlike matrix representation for picture,3D model is usually represented by using 3D data structure which can be described in various ways.Specifically,the way of describing the3 D model with a group of projected views is called view-based method,it is flexible and always performs excellent in classification and retrieval.Traditional approaches of viewbased 3D model retrieval include four steps: view capture,representative view selection,feature extraction and multi-views matching.However,this pipeline has been changed a lot with the advent of deep learning.Feature representation plays the key role in model retrieval,therefore,how to effectively extract features of model views has become a hot research topic in this field.With the rapid development of deep learning,the performance of image representation and feature extraction has been improved substantially.Viewbased methods have shown their superiority with the help of prominent 2D image features from deep nets.In addition,computational consumption of view-based 3D retrieval is reduced greatly with the end-to-end learning.However,learning excellent view-based feature is rather challenge.On one hand,the explosion of 3D data give birth to an abundance of 3D models,which greatly increase the complexity of data structure.On the other hand,how to design simple and reliable network for learning the features that distributed compact in intra class while apart from each other in inter class is still far from desirable level.Learning features for view-based 3D model retrieval and pose estimation is well studied in this dissertation.According to the way of organizing multiple view's features in the retrieval pipeline,the type of features can be divided into the decision-level fusion feature and feature-level fusion feature.In learning of the decision-level feature,it learns the view features in both the unsupervised and supervised way.While in feature-level,the work studies single model feature which is used for retrieving 3D models for a given query model and learns multi-modal features to retrieve relevant 3D models for 3D object.Specifically,the main contents and contributions are summarized as the following four aspects:Firstly,this dissertation proposes to learn views feature in the unsupervised way.The off-the-shelf convolutional neural networks which are trained with Image Net are employed to extract views' features,the previous handcrafted features are replaced with these deep features in the traditional retrieval pipeline.By comparing the retrieval performance between the two types of features on standard benchmarks,the powerful feature representation of convolutional neural network is fully demonstrated.The characteristics of deep features from fully connected layers are analyzed and multi-graph is employed to learn the higher order information among the multiple views.To reduce the dimension of deep features,sparse autoencoder is used to lift retrieval speed while little views' information is lost.All of these studies are achieved in an unsupervised manner which do not need labelled data,as a result,the proposed algorithm can be used in any new types of 3D model.Secondly,the independent view feature is studied and each single view's feature is computed by training the convolutional neural network.The off-the-shelf nets are taken as the backbone,and the nets are fine-tuned by minimizing the classification loss with the training views.View feature fusion with recurrent neural network is further studied.The multiple views are regarded as a sequence input to the recurrent neural network,and a feature vector is output as the final representation by fusing each view recurrently.The different importance of views are deeply studied,and attention layer is introduced to weight the views' features.Siamese architecture is then used to train the recurrent neural network to increase the discriminative ability.Thirdly,based on the feature-level fusion net which contains views pooling layer,the triplet loss and center loss based methods in learning model features are studied in detail.Two novelty losses are proposed to improve the retrieval performance substantially.The difference between classification and retrieval is presented and then point out that using the features learned from classifier for retrieval is inappropriate,which indicates relative distance is crucial for improving retrieval performance.Based on the analysis of the available triplet losses,this dissertation proposes cube loss for triplet training,the proposed loss could mining much more hard positive and hard negative samples thanother methods for the same training batch size.In addition,the effect of center loss on feature distribution is studied and center-push loss is then proposed to limit the intra class divergence while expand the distance between the inter class.Finally,this dissertation studies the topic of retrieving 3D models and the models with pose information for given 3D object.To tackle this cross-modal retrieval problem,the dissertation designs network to learn joint embedding space which is shared by the 3D object in nature image and their corresponding CAD models.Based on the center-push loss of last chapter,triplet-center-push loss is proposed for learning the joint embedding feature,the proposed loss performs very well for unseen object retrieval in the nature environment.By analyzing and summarizing related works in 3D pose estimation,classification-regression model is employed to estimate the object pose.View features,view classifier,view pose residual and view pose classifier are learned respectively in a step by step manner.A retrieval system that could be used to search related 3D models with pose information for the center object in real scene is developed finally.This dissertation focuses on learning multi-view features for the view-based 3D model retrieval from decision-level fusion and feature-level fusion respectively,it deeply studies the application of deep learning techniques in multi-view features representation,designs a series of feature learning strategies,proposes several loss functions to improve the 3D model retrieval performance.
Keywords/Search Tags:3D model retrieval, multi-view representation, feature learning, feature fusion, pose estimation
PDF Full Text Request
Related items