| In recent years,with the popularization of 3D acquisition equipment and the wide application of 3D processing technology,especially in the context of accelerated digitization in the metaverse era,3D model data has shown explosive growth.In the era of 3D big data,how to realize convenient 3D model acquisition and management is the current research hotspot in the field of multimedia analysis and retrieval.Meanwhile,it is of significant practical value for the intelligent management of digital contents of the metaverse,etc.Based on the thorough analysis of current research in the 3D model retrieval field,this thesis conducts in-deep research on multi-view representation learning from four central scientific issues,including multi-view saliency representation learning under variable physical space,multi-view consistency representation learning under variable feature space,multi-view transferable representation learning in heterogeneous space,and multi-view robust representation learning under adversarial conditions.(1)For the multi-view salience representation learning under variable physical space,this thesis proposes a multi-view latent spatial context-guided deep neural network.This method is designed for the case where the query and the candidate are both3 D models.It can extract the local saliency information of a single view and the global structure information of the view sequence to explore the multi-view latent spatial context for representative view selection.Further,it can realize multi-view saliency representation learning in a data-driven manner without relying on the fixed physical space a priori.The method can achieve competing performance with only one-tenth of the random view information of the representative method,since it can reduce the dependence on the view number and view order.(2)For the multi-view consistent representation learning under variant feature space,this thesis proposes a collaborative distribution alignment network.This method is designed for the case where the query is 2D image,and the candidates are 3D models.This method proposes to fuse the multiple view features based on the similarity between the query 2D image and individual views.To reduce the domain shift in feature space,this method performs multi-view representation learning at both domain level and class level.Consequently,we can achieve multi-view representation consistent with 2D representation.The method can improve the similarity between 2D image and multiple views of the same class in the feature space,resulting in higher ranking order of the 3D model of the same class than expected,and achieving an improvement of more than 15% in the Average Normalized Modified Retrieval Rank.(3)For the multi-view transfer representation learning in heterogeneous space,this thesis proposes a hierarchical instance feature alignment network.This method is designed for the case where the query is 2D image,and the candidates are 3D models.The method maximizes the mutual information between the input and feature representation to explore discriminative knowledge of each modality.With this information,this method hierarchically uses the global domain statistical information and local class semantic information to perform transferable multiview representation learning.The method can transfer the discriminative knowledge of the same class across different domains,reduce the intra-class distribution difference and outperform the methods of SHREC’2019 Monocular Image-based 3D Model Retrieval competition more than 12% in First Tier.(4)For the multi-view robust representation learning under adversarial conditions,this thesis proposes a progressive adversarial training network.This method proposes to generate inter-domain separated or inter-class compact adversarial query samples based on the cross-domain data similarity.This method designs an easyto-hard training strategy to progressively feed two types of adversarial query samples into the network to improve the stability of the feature extractor,and then achieves multi-view robust representation learning.The method can effectively reduce the noise introduced by the adversarial query samples,significantly improve the stability of the baseline method under adversarial conditions,and achieve the improvement of the Nearest Neighbor Precision more than 20%. |