Font Size: a A A

Research On Stereo Image Perception Of Binocular Vision Based On Deep Learning

Posted on:2022-04-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y R ZhangFull Text:PDF
GTID:1488306536498914Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of technology,dual-camera imaging systems have been widely used in smart phones,autonomous driving,smart robots and other fields.The dual camera system provides observation information facing the same scene from two different perspectives.The images formed by the two different perspectives have visual differences and contain rich complementary information,which is conducive to the reconstruction of image details in the two-dimensional space and the visual positioning in the three-dimensional space.In recent years,deep convolutional neural networks have shown strong image understanding capabilities and can be used to extract robust depth representations directly from RGB stereo image pairs.The performance of deep convolutional neural networks far exceeds traditional algorithms based on manual features.However,in practical applications,differences in the scene depth and camera construction principles(such as imaging resolution,camera baseline,and focal length)lead to large changes in the parallax between stereo image pairs.Therefore,the efficient and flexible use of dual-camera imaging systems to improve the resolution of stereo images and perceive the disparity of stereo images has important theoretical guiding significance,broad application prospects and huge social benefits.This dissertation starts with the construction of deep learning network,and conducts research on stereo super-resolution and stereo matching.The specific research work is as follows:Firstly,aiming at the problem that the learning method of the feature extraction network is single in stereo super-resolution,which leads to insufficient acquisition of semantic information,a diversified feature learning sub-network model is proposed.The co-directional pyramid residual module is constructed to extract feature information with multiple scales and large receptive field,enriching the image representation of the left and right perspectives.Based on the parallax attention mechanism and deformable convolution,a deformable parallax attention module is proposed.The adaptive image representation of the other view is extracted to capture the correspondence between stereo image pairs and improve the reconstruction performance of the stereo super-resolution network.Secondly,the interaction way of left and right image information also further affects the effect of image reconstruction on stereo super-resolution.Therefore,the principles of self-attention,parallax attention and fusion mechanism are studied,and an attention stereo fusion module is constructed to realize the stereo consistency interaction of left and right image feature information.In addition,an enhanced cross-view interaction strategy is proposed,which includes three parts: horizontal dense connection between attention stereo fusion modules,vertical sparse connection between single-image super-resolution branches,and feature fusion.The parallax attention map and the stereo image pair are further integrated to enhance the constraints between stereo consistency and improve the recovery ability of image spatial details.Thirdly,considering that the current stereo image matching network based on deep learning still has shortcomings in finding the corresponding points in the ill-posed region,a novel stereo matching network without post-processing is proposed.A multi-level feature pyramid pooling module is constructed,and features with the same resolution are fused hierarchically to make full use of multi-level semantic information and improve the robustness of image feature representation.A lightweight two-dimensional convolution sub-network is proposed.The low-level structure information in the target image is obtained by concatenating three convolutional layers with small convolution kernels,correcting mismatched cost value in the global view,and further improving the matching accuracy of the stereo image.Finally,aiming at the problem that the existing cost aggregation network in stereo matching still cannot maximize the aggregation of cost volume,a three-dimensional attention aggregation encoder-decoder network framework containing three sub-modules is proposed.Different from the standard three-dimensional encoder-decoder structure,a sub-branch and cross-level aggregation encoding module is designed to aggregate the context information of different sub-branch and cross-levels,so as to realize the mutual utilization of different depth cost volumes.At the same time,a three-dimensional attention recoding module is introduced to recalibrate the high-level semantic information of the sub-branch and obtain a robust discriminative cost volume.In addition,a stepwise aggregation decoding module is constructed to decode the cost volume,which further improves the learning ability of the cost aggregation network model.
Keywords/Search Tags:Deep learning, Stereoscopic super-resolution, Stereo match, Attention mechanism, Multi-level information
PDF Full Text Request
Related items