Font Size: a A A

Illumination And Viewpoint Invariant Descriptor Based On Convolution Neural Network

Posted on:2021-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z DaiFull Text:PDF
GTID:2428330611467559Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Keypoint matching is an important operation in computer vision,especially in 3D reconstruction and visual simultaneous localization and mapping(VSLAM)in the field of robotics.The matching of keypoints depends largely on the descriptor of keypoints.However,it is still a challenging problem to describe and match robust keypoints under extreme disturbances such as changes in illumination and viewpoint.Especially for the visual navigation of mobile robot,the influence of illumination and viewpoint is particularly prominent,which is one of the core problems in the research of VSLAM.The purpose of this paper is to generate a keypoint descriptor that is robust to changes in both illumination and viewpoint,so as to improve the matching accuracy of images with such changes.Firstly,this paper summarizes the existing keypoint descriptors and divides the existing work into three classes: hand-crafted descriptors,convolutional neural networks(CNN)descriptors derived from trained CNN model and CNN descriptors derived from universal pre-trained CNN model.This paper makes a comparative study on the ability of three classes of keypoint descriptors to deal with illumination and viewpoint changes.A total of 10 representative descriptors were selected,and these 10 descriptors were evaluated at the latest HPatches of the benchmark dataset for image keypoint matching.The research found that :(a)CNN-based descriptors are superior to hand-crafted descriptors in the matching accuracy of illumination and viewpoint changes;(b)under the condition of viewpoint change,the matching accuracy of the trained CNN descriptor is better than that of the pre-trained CNN descriptor,while(c)under the condition of illumination change,the matching accuracy of the pre-trained CNN descriptor is better than that of the trained CNN descriptor.Then,based on the advantages of trained and pre-trained descriptors to handle illumination and viewpoint changes,we propose a descriptor fusion model(DFM)framework,which uses two autoencoders to fuse trained and pre-trained descriptors and generates a keypoint descriptor that is robust to illumination and viewpoint changes.The first autoencoder in DFM is a convolutional autoencoder(CAE),which is used to reduce the dimensionality of the pre-trained descriptor.The second autoencoder is a fully-connectedautoencoder(FCAE)for fusing the trained descriptor and the compressed pre-trained descriptor.Similarly,we compare these two autoencoders with some common dimensionality reduction methods and data fusion methods on the HPatches dataset,we find that(a)the CAE is a better dimension reduction technique than principal component analysis(PCA)and random projection(RP),(b)the compressed pre-trained descriptors are better than the uncompressed descriptors from convolution layers of pre-trained CNN models,(c)the FCAE is a better fusion method than product,summation and concatenation.Our proposed descriptor fusion model(DFM)architecture can handle any trained and pre-trained CNN models.Based on the descriptor performance generated by the existing CNN model,we choose Hard Net and Dense Net169 as representatives for trained and pre-trained CNN models respectively.Then we compare the descriptors generated by the DFM framework with other state-of-the-art on the HPatches dataset.The experimental results show that DFM is able to achieve state-of-the-art performance,with the mean m AP that is 6.45% and 6.53% higher than Hard Net and Dense Net169,respectively.
Keywords/Search Tags:illumination changes, viewpoint changes, convolutional descriptor, keypoint matching, SLAM
PDF Full Text Request
Related items