| Depth estimation aims to estimate the distance of a scene target in an image or video relative to the camera,and its goal is to reconstruct a 3D model of a scene to better understand scene geometry and improve various vision applications.Light field imaging enables simultaneous capture of both the intensity and angular information of incident light rays.Compared to traditional 2D images,the scene information captured by light field is more complete.Due to the rich structural information contained in the light field,light field depth estimation can achieve more accurate results compared to traditional 2D image depth estimation methods.Existing light field depth estimation methods usually use convolutional neural networks to learn the geometric structure information of the light field from the light field sub-aperture image,so as to predict the light field parallax map.The method has problems such as high redundancy of model input data and insufficient amount of data in light field dataset.In view of the above problems,this thesis proposes two light field depth estimation methods based on EPI: EpiFormer and SFMM-EpiFormer,which are structurally progressive,and their effects are significantly improved compared with traditional CNN methods in the face of problems such as untextured area,occlusion area and noise area.Aiming at the problem of high redundancy of input data and insufficient amount of data in light field dataset of existing light field depth estimation models,this thesis uses the Transformer network to explicitly learn the light field geometry information from the EPI image block,and proposes a light field depth estimation method based on the EPI image block EpiFormer.This method extracts a set of EPI image blocks for each pixel from the four directions of 0°,45°,90°,135°,and uses them as input,and predicts the pixel parallax value of the center of the EPI image block by learning the geometric features of the light field corresponding to the four directions,passing through the Transformer feature fusion module.Compared with the existing depth estimation methods based on EPI analysis,EpiFormer ranked first in the mean error of Bad Pix(0.07)and MSE(Mean Square Error)indicators of HCI 4D light field depth estimation evaluation dataset.Aiming at the problem that EPI image blocks lack spatial structure information,are easily affected by occlusion and noise,a spatial EPI-based light field depth estimation network SFMM-EpiFormer is proposed.This method introduces spatial information into EpiFormer,extracts additional adjacent EPI image blocks in each direction of 0°,45°,90°,135°,and designs a spatial feature matching module to enhance the spatial perception ability of the network and make up for the shortcomings of EpiFormer.On the Bad Pix(0.07)and MSE metrics of the HCI 4D Light Field Depth Estimation dataset,compared to EpiFormer,the mean error values of SFMM-EpiFormer were reduced by 36% and 50%,respectively. |