Depth information is necessary for tasks such as 3D scene reconstruction,simultaneous localization and mapping,and autonomous driving.Obtaining depth information by monocular images has the features of easy portability,low cost,and low operational difficulty,and full-resolution depth images can be obtained.Therefore,monocular image depth estimation is a popular research direction in depth estimation.At present,monocular image depth estimation methods based on deep learning all suffer from the problems of inadequate image feature expression and low resolution of estimated depth images.Therefore,this paper focuses on feature extraction and expression,aiming to improve the accuracy of depth estimation while obtaining a higher resolution depth image.The main work of this paper is as follows:(1)Design a monocular image depth estimation network model based on multi-scale feature fusion.The model adopts an encoding-decoding structural framework.The model encoder adopts an asymmetric convolutional design to extract and fuse image features by rectangular convolutional kernels of different sizes as a way to improve the expressiveness of feature images.The model decoder adopts a multi-branch upsampling module with a jump connection to the encoder as a way to improve the model’s ability to infer detailed depth information of image objects.(2)Wasserstein distance loss is introduced to improve the inferring ability of the depth estimation network.The Wasserstein distance loss measures the difference in the distribution of point clouds restored by the depth image to further constrain the network model,thus improving the model’s ability to infer the depth of surface jumping object pixels.(3)Introducing a positional estimation network to improve the depth accuracy of depth estimation network inference.The role of the pose-estimation network is to infer the perspective transformation relationship between stereo pairs of images.Using the inferred viewpoint conversion relationship,camera internal reference and depth image from the bit-pose network,the stereo pair images are strictly constructed,and then the depth accuracy estimated by the depth estimation network model is improved.In order to verify the effectiveness of the proposed method,the proposed method is validated on the KITTI dataset.The experimental results show that the test results of the model proposed in this paper compare well with the models of similar methods in terms of absolute relative error,accuracy and other indexes. |