In the field of computer vision,obtaining depth information from one or more2 D plane images is one of the key issues.Depth information can enable computers to recognize and simulate real 3D scenes from two-dimensional images more accurately,thus further to challenge various visual tasks.Depth estimation has wide application prospects in 3D reconstruction,unmanned driving,intelligent devices,virtual reality and other fields.Common imaging equipment can only obtain two-dimensional images without depth information,while depth acquisition hardware on the market is limited due to the short distance and other factors.Therefore,this dissertation will concentrate on the comprehensive study of image depth estimation algorithm.Firstly,this dissertation investigates the status quo and development trend of monocular and binocular depth estimation.Secondly,the core problems of binocular depth estimation,namely stereo matching,and the core problems of monocular depth estimation,namely pixel-level regression and classification,are discussed respectively.In the task of binocular depth estimation,a genetic optimization stereo matching algorithm based on multi-brightness layer is proposed to solve the problem of large brightness difference in practical scenes.Meanwhile,the application of convolutional neural network in binocular stereo matching is analyzed and a fast dilated multi-scale stereo matching convolutional neural network algorithm is proposed.In the monocular depth estimation task,the internal relationship between monocular depth estimation and semantic segmentation task is discussed,and the depth estimation problem is transformed from regression problem to classification problem.A coding and decoding structure based on upper sampling module of spatial pyramid is proposed to predict the depth information of monocular image.The main innovations of these research contents include:1.A genetic optimization stereo matching algorithm based on multi-brightness layers is proposed.The algorithm establishes and matches multi-brightness layers according to the local extremum of the image histogram fitting curve.After the matching of multiple brightness layers,the brightness changes are reduced sharply,so the accuracy of stereo matching and feature point matching on the matching layer pair can be greatly improved.Moreover,a fast segmentation local stereo matching algorithm is proposed.Taking advantage of the convenience of other steps to match feature points,combining with scale-invariant feature transform,a high-density feature point range is used to replace the fixed parallax search range,so the stereo matching accuracy is improved and the calculation cost is reduced.On this basis,combined with the improved genetic optimization algorithm of stereo matching,several parallax maps obtained from brightness layer pairs were taken as genes,and the continuity and accuracy fitness functions were set,as well as the crossover and mutation operations in line with the characteristics of stereo matching.Experimental results on Middlebury and KITTI datasets show that the multi-brightness layer mechanism can reduce the effect of more than 80% of the average brightness difference on stereo matching.Under different illumination,parallax,rotation and scaling conditions,the accuracy of depth estimation is improved by more than 4% and the calculation time is reduced by more than 34% compared with the multi-objective fitness function genetic algorithm with scale-invariant feature transformation.2.A fast dilated multi-scale convolutional neural network algorithm for binocular stereo matching is proposed,and neural network are applied to the calculation of stereo matching cost.In view of the shortcoming that the small convolutional kernel covers a small feature region,the dilated convolutional kernel is used to improve the stereo matching accuracy on the basis of a small number of parameters.The multi-scale network structure is used to calculate the loss function at different scales to further improve the matching accuracy.A multi-stage dot product composite loss function network structure is proposed,and the depthwise separable convolution operation is combined to greatly improve the computing speed of the neural network.The multistage composite loss function is applied to multi-scale networks,and the fully-connected layer which occupies a large time is replaced by the direct dot product transformation to calculate the loss function,which can greatly reduce the network parameters and computation time.Experiments results on KITTI dataset show that the error rate of depth estimation is reduced by 0.17% and the computational speed is increased by about 6 times compared with the optimal algorithm of similar convolutional networks.3.An encoding and decoding framework based on spatial pyramid upsampling mechanism for monocular depth estimation is proposed.Firstly,A spatial pyramid up-sampling mechanism with learning ability is proposed,which integrates up-sampling and multi-scale feature extraction into one module,and makes more use of data mutual information than the non-learning up-sampling method.Secondly,A monocular depth estimation coding and decoding framework is proposed.The encoder uses Dense Net to extract features,and the decoder uses cascading spatial pyramid up-sampling modules to recover images.In addition,the use of Bottleneck structure and Depthwise Separable Convolution(DSC)will simplify the number of parameters in the whole frame and improve the estimation efficiency.Experimental results in the NYU Depth V2 dataset show that the depth estimation average relative error is reduced by 2.4% compared with the soft weight inference extended residual network algorithm which also adopts the classification strategy.The experimental results in Make3 D data set show that the average relative error of depth estimation is reduced by 5.7% compared with Fully Convolutional Residual Networks(FCRN)algorithm. |