Font Size: a A A

Research On Binocular Vision Stereo Matching Method Based On Convolutional Neural Network

Posted on:2024-07-11Degree:MasterType:Thesis
Country:ChinaCandidate:X G LiFull Text:PDF
GTID:2568307058471984Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Stereo matching is an important technique for three-dimensional scene perception,which provides basic algorithm support for automatic driving,three-dimensional reconstruction and detection.Restoring 3D structure from a 2D image is fuzzy in nature,so it is usually done from multiple angles.Stereo matching of binocular vision is an important dependence to achieve depth estimation.At present,there are many excellent stereo matching methods based on convolutional neural network,but in the matching process,it is still interfered by the sick area of the image,such as shadow,no texture or weak texture,reflection and occlusion area,which will lead to the reduction of the accuracy of the disparity map.In addition,datasets with either dense labels or sparse labels bring limitations to the research.In this paper,two models are built respectively,aiming at how to obtain a disparity map with higher accuracy and how to construct a lighter unsupervised stereo matching network.Through the use of adaptive features,attention mechanism and other methods,stereo matching steps are built,integrated and optimized in the network.The main work of this paper is as follows:(1)Aiming at the problem that it is difficult to correctly infer disparity due to small pixel differences in sick area,this paper proposes a hybrid domain normalized stereo matching method under adaptive features.This method enables the network to adaptively focus on the region of interest to learn the object’s correlation features,and alleviates the disparity "truncation" problem caused by the same object being predicted under different disparity levels.The disparity inference ability of the network at the edge of image is enhanced.In addition,stereo matching is sensitive to the channel domain and relies on the channel features to find the exact correspondence between pixels.Therefore,this paper enhances the feature representation ability of a single sample in the channel domain on the basis of the normalization of multiple versions,so as to show more reliable disparity inference ability in the sick area through the method described above.The endpoint error of this method on the SceneFlow dataset is 0.79 pixels,and the error matching rate on the KITTI2015 dataset is 2.38%.(2)In order to build a more robust 3D matching cost volume and a lighter feature extractor,this paper proposes a method combining cross-correlation and Manhattan distance to calculate the matching cost,so as to improve the incompleteness of pixel matching.In the process of building the attention mechanism,the cost volume is implicitly constructed,which greatly reduces the problem of increasing the number of parameters due to the addition of additional feature filters.In addition,in order to improve the efficiency of disparity regression and reduce the dependence of network training on the label of supervised data,the depth prediction is transformed into a regression problem in the process of iterative disparity map,and unsupervised training is carried out by using the idea of composite map.The final disparity map is obtained by means of residual disparity correction,which improves the efficiency of disparity regression.This method has undergone extensive experiments on the SceneFlow and KITTI datasets,which have shown that the model exhibits excellent disparity results in both object edges and foreground regions.
Keywords/Search Tags:Stereo matching, Adaptive feature, Hourglass network, Attention mechanism, Unsupervised training
PDF Full Text Request
Related items