The purpose of binocular stereo matching of binocular vision is to obtain the disparity of the corresponding pixels from the binocular images pair,thereby restoring the depth information of the pixels.Stereo matching of binocular vision has received widespread attention due to its universality and flexibility.Stereo matching is the key technology for monotonous operation of robots equipped with binocular vision.The improvement of the matching accuracy of it is the focus of academic research.The traditional method for stereo matching has the problem of low matching accuracy in ill-posed areas such as reflection areas and object edges(discontinuous disparity),resulting in incorrect output of depth information.The traditional stereo matching algorithm has been continuously improved,but the matching performance in these ill-posed regions is a challenge work.With the improvement of the computing power of the processors,the stereo matching method based on deep learning provides a powerful means to solve the problem of low matching accuracy of traditional methods in ill posed regions.However,how to improve the matching performance of deep learning network in terms of topology and feature utilization is a challenging frontier topic.The recently proposed GWC-Net has shown excellent performance,but there are still problems of insufficient interpretability and insufficient utilization of image features.This thesis proposes two end-to-end stereo matching networks based on the GWC-Net network,whose performance improvement is testified through experiments.The main research contents of the thesis are as follows:(1)This thesis reviews the basic theory of binocular stereo matching,the topology of deep learning network and the application of stereo matching.Some typical stereo matching schemes based on deep learning framework are analyzed in depth,and the network structure and implementation principle of benchmark model GWC-Net are investigated.(2)An end-to-end network based on dense connection attention mechanism is proposed,which is called DCA-GWC-Net for short.In the initial feature extraction stage,Dense network is introduced to extract richer features and improve the utilization of features,and an improved coordinate attention mechanism is also introduced to capture the correlation and spatial(position)related information between image feature channels.The introduction of Dense network and coordinate attention mechanism improves the matching performance of the network,and the matching accuracy of object details and reflection areas is significantly improved.(3)Based on the dense connection attention mechanism,a multi-scale information fusion end-to-end network,DCA-SF-GWC-Net,is proposed.In the cost volume construction stage,a cross-scale packet related cost volume is proposed based on the construction method of packet related cost volume in GWC-Net,which learns the important spatial and channel information in the cost volume.A scale aware fusion model is proposed to obtain more dense and high-precision disparity maps by using the difference complementarity of disparity maps at multiple scales.The experimental results of DCA-GWC-Net on Scene Flow and KITTI 2015 data sets show that the matching accuracy of this method is significantly improved in object details and reflection areas.A DCA-SF-GWC-Net network is proposed to further improve the performance of DCA-GWC-Net network.The experimental results on Scene Flow and KITTI2015 data sets show that the matching accuracy of DCA-SF-GWC-Net is significantly improved at the edge of the object and the reflection area,which proves the robustness and effectiveness of the improved network model proposed in this thesis. |