| Stereo matching techniques have been the focus of research in computer vision and computer graphics in many fields,such as 3D reconstruction,robot navigation,autonomous driving,unmanned vehicles,etc.disparity estimation is the process of estimating disparity from a pair of corrected binocular images,and this problem is an ill-posed problem because the same object may appear differently in different images.Currently,deep learning-based algorithms have achieved good success in areas of stereo matching.However,deep learning methods still have some problems,such as the slow running speed of the models and the fact that the models are still affected by factors such as slender structures,weakly textured regions and complex scenes.These problems limit the promotion and application of deep learning methods in practical applications.In addition,in industrial-grade applications,not only the matching accuracy needs to be concerned,but also the speed of disparity estimation has to be considered.Therefore,it is important to develop real-time and high accuracy stereo matching algorithms.1.We present CMNet,a lightweight stereo matching architecture for improving the trade-off between speed and accuracy on resource-limited devices.A novel feature extraction network consisted of a patch embedding layer and a Conv MLP-mixer is proposed.The patch embedding layer enhances the receptive field and makes the feature vectors compact.The accuracy of the disparity map is increased by mixing the spatial information in the channel dimension through the Conv MLP-mixer.The absolute difference volume is concatenated with the group-wise correlation volume to provide multi-dimensional matching cost information for the cost aggregation stage.Being evaluated on KITTI 2012 and KITTI 2015 stereo matching datasets,the inference time of CMNet on NVIDIA GTX 2080 ti GPU is 8.7 ms.While realizing fast predictions beyond real-time,the results of D1-all are 3.41% on KITTI 2012 and 3.84%on KITTI 2015,achieving state-of-the-art result between speed and accuracy.Besides,the lightweight architecture of CMNet enables a fast inference time of 40.7 ms on Nvidia Jetson Nano to realize real-time applications on edge devices.2.A real-time stereo matching algorithm based on multi-scale 3D-CNN is proposed.The algorithm achieves fast reasoning speed,improves the matching accuracy of the results,and achieves a balance between accuracy and speed.First,the model is innovative in network structure design and feature extraction,using Visual Attention Network(Van)as the feature extraction module,which is able to consider local contextual information,large receptive field,linear complexity and dynamic process.In addition,the large kernel attention(LKA)in the visual attention network method not only achieves the adaptability of the spatial dimension,but also the adaptability of the channel dimension.Secondly,a multi-scale method is used to construct a cost volume,and 3D convolution is used to aggregate features on multiple scales,which effectively improves the matching accuracy.The visual attention network(Van)used in this paper and the proposed multi-scale 3D-CNN module can significantly improve the accuracy of stereo matching in complex scenes,making the model D1-all on the KITTI 2012 and KITTI 2015 datasets reduced to 2.80% and 3.02%,and the speed can reach 12.6 milliseconds on an NVIDIA GTX 2080 ti GPU,maintaining a good balance between accuracy and speed. |