Multi View Stereo is an important problem in computer vision.With the development of artificial intelligence technology,more and more different technologies are applied to this field.The performance of learning based MVS method has exceeded the traditional method.There are three difficulties in learning based MVS method.The first is to extract features and construct cost volume according to different views.Second,we need to regularize the cost volume to remove the noise,but it usually consumes a large amount of GPU memory,so we can’t train high-precision images.Third,the current learning based methods can use less 3D data,and can not train the network in the real environment.In this thesis,the above three difficulties are studied,and the following three aspects are completed:(1)This thesis introduces the attention mechanism into the model,which is different from the view aggregation method based on mean square error.Considering the influence of different views on the target image,this thesis proposes an adaptive aggregation module,which gives different weights to different views.Two different adaptive aggregation modules are proposed,one is pixel based adaptive module,the other is voxel based adaptive module.Experiments are carried out on Tanks and Temples data set.The experimental results show that the accuracy rate and recall rate of the adaptive aggregation module are improved by 3%and 5%respectively,compared with the aggregation module based on mean square error.The effect of the voxel based adaptive aggregation module is better than that of the pixel based adaptive aggregation module by 2%.(2)In this thesis,the recurrent neural network module is introduced into the cost volume regularization stage.The regularization problem of three-dimensional cost volume is regarded as the regularization of twodimensional cost volume in the depth direction,and the context is maintained on this basis.Using the recurrent neural network module to regularize the cost volume can greatly reduce the consumption of GPU memory,so that in the case of limited GPU memory,it can train highprecision pictures,and then it can retain and forget the information in different depths.Experiments are carried out on Tanks and Temples datasets,and the experimental results are compared with several mainstream network architectures.The experimental results show that compared with MVSNet,GPU memory consumption is reduced by 64%,and the running speed is improved by more than 90%.(3)In the whole training stage,this thesis adds self supervision training on the basis of supervision training.In the self-monitoring training stage,this thesis proposes two different levels of loss function,namely pixel level loss and feature level loss as the total loss.Experiments are carried out on Tanks and Temples data set,and the experimental results show that adding self-supervision training to the model can effectively improve the performance of the model in unfamiliar scenes.Compared with the original model,the model with self-supervision training improves the accuracy by 3%and the recall by 10%。... |