Font Size: a A A

Study On Multi-view Based 3D Model Retrieval

Posted on:2021-12-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W HeFull Text:PDF
GTID:1488306107456264Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
3D models or 3D objects,as the basic elements of the 3D world,play an essential role in the process of human perception and understanding of the world.With the development of computer technology,how to make the computer sense and understand the 3D models has become a natural and long-standing topic from both academia and industry.To achieve this goal,various computer vision tasks have been proposed.For instance,3D scene segmentation,3D object detection etc.Among them,3D model retrieval which aims at retrieving 3D models with the same semantic content(e.g.,label)from database for a given 3D model query,is undoubtedly one of the most important tasks and has drawn great attentions due to its direct applicability for designing 3D model search engine as well as the great potential in many applications such as autonomous driving,virtual/augmented reality,3D printing,and medicine.Based on existing deep learning techniques,this thesis mainly investigates ways to improve the performance of multi-view-based 3D model retrieval.Specifically,the research achievements are as follows:(1)We propose a novel framework to effectively aggregate the multi-view features.Specifically,the framework first treats each view image as one “word” to depict the 3D models and decomposes the view sequence densely into a set of overlapping subsequences(we call one subsequence as view n-gram)following the idea of n-gram models in Natural Language Processing(NLP)field.Then,it computes the features based on each visual n-gram.Finally,it uses an attention module to fuse the features.Since different visual n-gram sizes can capture different scales of the adjacent views,we further propose to combine features of visual n-grams of different sizes.In this way,we can obtain more discriminative features by considering spatial relations among the local adjacent views into account.Without metric learning such as triplet-center loss,we can obtain very competive performance on popular 3D shape benchmarks.For instance,on Model Net40,we can obtain 88.9% in m AP.On Model Net10,we obtain 92.8% in m AP.We also perform experiments on SHREC 2016,and we outperform many existing state-of-the-art methods by a large margin.(2)A novel metric learning based loss function named Triplet-Center Loss(TCL)is proposed for 3D model retrieval.During training,our loss function constraints the deep 3D representations of the same class to be clustered in a more compact way,while representations of different classes have better separability at the same time.The addition of this loss can drive the network to learn more suitable features.We verify the effectiveness of our method on serval 3D model benchmarks including Model Net40 and Shape Net Core 55.Besides,we also conduct experiments on several sketch based 3D model retrieval benchmarks,and we have achieved superior results.When compared with many state-of-the-art methods,we improve by over 5% in m AP.(3)We propose two important improvements to strengthen existing architecture(i.e.,MVCNN)to learn more suitable 3D representations.Specifically,the first improvement is to improve the its pooling strategy by encouraging the network to learn the group-view similarity.We believe that by adding a group-view similarity learning branch before the pooling operation can avoid the similarity information loss among the multi-view images,which benefits global representation learning for the view-based 3D model retrieval.Further,we also improve the current Triplet-Center Loss whose margin hyperparameter can change dynamically according to the separability between samples of different classes.The adaptive margin-based triplet-center loss(AMTCL)is more flexible and can help the network learn more discriminative feature space.Experimental results show that the two improvements help the original MVCNN to obtain better retrieval performance than most state-of-the-art methods on several 3D shape benmarks.(4)We propose a correspondence-aware fusion framework for the multi-view images and point cloud data.The framework first calculates local correspondence scores of the multiview images and point cloud.Then it filters out scores with low values to obtain more salient local correspondences for the fusion process.Finally,it further fuses the two modalities both bidirectionally and hierarchically in order to obtain more informative features.Comprehensive evalutaions on popular 3D model benchmarks show its effectiveness.For instance,by fusing the two modalities,we obtain 92.9% in m AP on Model Net40,significantly outperforming many state-of-the-art methods.In summary,this thesis is mainly concerned about multi-view-based 3D model retrieval and proposes a series of solutions to improve the representation learning for 3D models,which provide effective support for future research in 3D model retrieval.Furthermore,the theory,architectures and loss functions in this thesis may also be instructive and beneficial for research on other computer vision problems.
Keywords/Search Tags:3D Model Retireval, Multi-view Images, Point Cloud, Similarity Learning, Loss Function, Convolutional Neural Networks, Multi-modal Fusion
PDF Full Text Request
Related items