| Due to the dramatic changes in the viewpoints of unmanned aerial vehicle(UAV)and satellite images,there are great differences in visual appearance,and the performance of the existing cross view image matching methods is difficult to improve,so the research of UAV visual positioning is still facing challenges.Based on the study of a large number of domestic and foreign cross-view image matching algorithms,this thesis analyzed the shortcomings of existing models and made certain improvements at three levels: pixel-level,feature-level,and decision-level,and the main research works are as follows.(1)To address the problem of excessive spatial domain difference between UAV-satellite views and the neglect of spatial layout information,a proactive generative model based on viewpoint transformation has been proposed from the pixel level combined with manual features and depth features.The model firstly used inverse perspective mapping(IPM)for coordinate transformation to explicitly bridge the spatial domain difference so that the spatial geometric features of the projected image and the real satellite image were approximately the same.It then implicitly matched and restored the image contents and textures at a fine-grained level by the newly proposed cross-view generation adversarial network(CVGAN)to synthesize a smoother and real satellite image.Finally,the generative effect of the model was verified through a large number of qualitative and quantitative experiments,and the spatial domain difference was initially bridged.(2)For the problem of large differences in viewpoint-invariant feature discrimination performance extracted between UAV-satellite views,a posterior image retrieval model based on multi-view multi-supervised network(MMNet)has been proposed from the feature level based on depth features,which is mainly composed of four innovation modules.The multi-supervised learning: Jointing representation learning and metric learning could effectively solve the problems of single representation features and polarized inter-class and intra-class distances caused by single-supervised learning;The multi-scale feature fusion:Fusing global and local features could effectively alleviate the impact of matching performance degradation caused by missing contextual information,rotation and position offset caused by single feature scale;The re-weighted regularization strategy and multi-view balanced sampling strategy: They could effectively overcome the imbalance limitation of view samples in the dataset.The proposed MMNet achieves Recall@1 of 83.97% and average precision(AP)accuracy of 86.96% respectively on UAV dataset University-1652.The experimental results show that MMNet could extract significant and geometrically consistent viewpoint invariant features,and effectively improve the performance of cross-view image matching.(3)To address the performance improvement bottleneck caused by the fragmentation of viewpoint-invariant features and viewpoint transformation methods,a multi-task joint learning model(MJLM)is proposed based on deep feature adversarial decision from the decision-making level.The main idea of this model was to jointly process cross-view image generation tasks and retrieval tasks within an aggregated framework system to achieve the fusion of viewpoint transformation and viewpoint invariant feature-based methods.Specifically,the proposed model mapped the given UAV image and satellite image pairs to their latent feature spaces and established associations,and used these features to accomplish both tasks.On the one hand,the post retrieval task ensures that the generated satellite image infinitely approximates the real satellite image.On the other hand,the proactive generative task enabled MJLM to learn geometrically consistent viewpoint invariant features between the two fields of view.Through the experiment on university-1652,MJLM is used in UAV positioning task R@1 And AP indicators reached 87.54% and 89.22% accuracy respectively,which increased by 4.25% and 2.60% compared with MMNet.The simulation experimental results demonstrated that the model further improved the performance of cross-view image matching,outperformed other cutting-edge methods,and performed well in terms of accuracy and robustness. |