Font Size: a A A

Research On Stereo Matching In Computer Vision

Posted on:2021-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H WuFull Text:PDF
GTID:1368330611953144Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
By simulating the binocular vision function of human beings,computer stereo vision can perceive the three-dimensional world of the surrounding reality with a pair of stereo images of the same scene from different viewpoints.As a non-contact passive depth sensing method,stereo vision technology has been widely used in many fields,such as autonomous navigation,autonomous driving,3D measurement and 3D reconstruction,virtual reality and so on,because of its low cost,easy implementation and the ability to recover the dense 3D information of the scene.Stereo matching is the most core technology in stereo vision,and its purpose is to determine the corresponding relationships of all pixels between a pair of stereo image,so as to restore depth information with the disparities between corresponding pixels.However,stereo matching has always been a bottleneck for stereo vision technology.Firstly,stereo matching itself is an ill-conditioned problem of solving 3D spatial information from 2D image planes.Besides,there are many challenges in the matching process,including imaging differences between stereo images caused by radiometric changes,the distortions introuced by epipolar rectification,matching ambiguity caused by similar textures,occlusion problem and disparity expansion problem at the object boundary.Thus,under the influence of these factors,stereo matching is still a very challenging research topic for quickly searching out all the true matching point pairs from stereo images and then restoring the complete and correct 3D information of the scene.By actively exploring and deeply studying the above difficult problems that affect the performance of stereo matching,some novel algorithms for effectively solving them are proposed in this thesis,aiming to provide new ideas for better solving the relevant problems of stereo matching.The main work and innovation of this thesis are as follows:(1)An epipolar rectification algorithm with singular value decomposition of essential matrix is proposed.In view of the problem that the traditional rectification methods need iterative optimization and are prone to fall into local optimization,a novel method based on the singular value decomposition of the essential matrix is proposed to obtain the closed solution of epipolar rectification.Firstly,by using the prior information that the camera intrinsic parameters are known in the stereo vision system,the essential matrix is derived from the fundamental matrix and the known camera intrinsic parameters.Secondly,singular value decomposition is performed on the essential matrix,and the ambiguity of the decomposition is eliminated according to the inherent geometric constraints of the left and right cameras.Finally,in order to make the optical axes of the left and right cameras parallel and perpendicular to the baseline,the corresponding transformations are carried out on the decomposed orthogonal matrices,and the two projective transformations obtained are closed solutions of epipolar rectification.Therefore,the epipolar rectification method proposed in this thesis does not need any optimization process.The experimental results show that the proposed epipolar rectification method not only has higher efficiency and accuracy,but also produces less distortion.(2)Aiming at the problem of imaging differences caused by radiation changes that often occur between stereo images,a stereo matching method based on Census features for anti-radiation changes is proposed in this thesis.For each pixel,Census transformation is implemented on the gray scale image and two gradient images respectively,and then the generated bit strings are concatenated to construct a lightweight binary eigenvector for each pixel.Accordingly,the similarity between the pixels to be matched is calculated by the Hamming distance of their eigenvectors.In order to improve the signal-to-noise ratio of the disparity map,and effectively deal with the disparity expansion effect at the object boundaries,a new fusion adaptive support weights strategy for cost volume filtering is proposed,that is,a local edge-aware filter and a non-local edge-aware filter are respectively used to filter the cost volume,and then their filtering results are averaged to achieve fusion.Finally,the final disparity map is calculated by the "Winner Takes All" method and post-processing operation for disparity refinement.Experimental results show that the proposed algorithm is not only robust to radiation changes,but also keeps the edges of the objects in the disparity map well.(3)Aiming at the problem of matching ambiguity caused by similar textures,a fast cost aggregation method based on oriented linear trees is proposed in this thesis.Firstly,each pixel in the image has an oriented linear tree rooted on it,and each oriented linear tree consists of multiple 1D paths from different directions,thereby avoiding designing an optimal support window for each pixel.Secondly,the oriented linear trees are used to perform cost aggregation on the initial cost volume,so that each root pixel can not only get the supports of its neighborhood pixels,but also all other pixels in its tree support it along their own 1D paths.Finally,the disparity map is calculated and used to construct a new cost volume,and then the oriented linear trees are used to aggregate the new cost volume,so the valid disparities can propagate to the occluded pixels and the mismatched pixels along multiple 1D paths.In the implementation process of cost aggregation,for each 1D path,by traversing the path back and forth twice,the aggregation cost along the path can be calculated at once for all pixels on the same path,which makes the proposed algorithm have low computational complexity.Experimental results show that the proposed algorithm can not only effectively eliminate the matching ambiguity in different texture regions,but also greatly improve the matching speed.(4)In view of the fact that the non-learning methods are unable to effectively mine the deep internal relations between stereo images in complex scenes,an end-to-end group distance network is proposed to predict disparity by directly learning the mapping of stereo images to their disparity map in this thesis.First,multiple residual modules are used to extract features of different depths for each pixel,and the feature vectors of these different depths are cascaded to fuse the features with these different attributes.In order to reduce the feature vector dimension and effectively reduce the loss of feature information in the vector,a new strategy of calculating the group distance by grouping feature vectors is proposed.The corresponding pixels of the left and right views are aligned according to the disparity,and then their feature vectors are divided into the same number of groups.Then,the Euclidean distance between sub vectors in the corresponding group is calculated.Accordingly,the Euclidean distances of all groups are concatenated to form the grouped distance vector.Once the distance vectors of all pixels at each disparity are solved,a 4D cost volume can be constructed.In order to better merge the contextual feature information of neighboring pixels and the distance information of adjacent disparities,a cascaded 3D hourglass sub-network is adopted to filter the 4D cost volume.Finally,the filtering results are regressed and then the final disparity map is generated.Experimental results show that the disparity prediction performance of the proposed group distance network is better than other current convolution neural network methods.
Keywords/Search Tags:stereo matching, epipolar rectification, Census feature, cost aggregation, oriented linear tree, convolution neural network
PDF Full Text Request
Related items