Font Size: a A A

Research On Optical Flow And Scene Flow Estimation Based On Deep Learning

Posted on:2021-06-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:M L ZhaiFull Text:PDF
GTID:1528306905490624Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Estimating optical flow and scene flow is an important subject in the field of computer vision research and application,and it has a very broad application prospect in the fields of autonomous driving,intelligent robots,action and expression recognition and object tracking.The 2D optical flow represents the pixel movement of objects projected on the image plane,and scene flow can be regarded as the expansion of optical flow from two to three dimensions.The scene flow represents the instantaneous three-dimensional motion vector of visible points on the scene surface in the real-world.According to the different input data,the scene flow can be divided into 2.5D binocular scene flow(input as binocular stereo image sequence),2.5D monocular scene flow(input as monocular image sequence),and 3D scene flow(input as point clouds).The main research goal of optical flow and scene flow estimation is to find accurate,efficient,and robust approach,and many researchers have made great efforts to this end.In recent years,considering the deep learning technology for data powerful learning,processing and analysis ability,many researchers choose to use deep learning technology to deal with the problem of optical flow and scene flow estimation.Compared with traditional methods,the optical flow and scene flow estimation methods based on deep learning have the advantages of fast running speed and high accuracy,and have achieved leading results on several public datasets.However,there are still some unsolved problems in the field of optical flow and scene flow estimation.Although many approaches for optical flow and scene flow estimation have been proposed in recent years,motion estimation is still a challenging and unsolved problem.To focus on existing problems in the field of optical flow and scene flow estimation,this dissertation does in-depth research following the pipeline of 2D optical flow,2.5D scene flow,and 3D scene flow gradually and proposes a series of meaningful solutions.The main contents and contributions of this dissertation are as follows:(1)Since existing networks for optical flow estimation rely too much on training data and ignore to explore prior knowledge to improve the accuracy of optical flow estimation,this dissertation proposes a knowledge and data hybrid-driven approach for optical flow estimation,which achieves complementary advantages between knowledge-driven and data-driven.The proposed approach introduces prior knowledge assumptions(brightness constancy assumption,gradient constancy assumption and spatial smoothness assumption)into encoder-decoder architecture and spatial pyramid architecture,so that the trained model is more consistent with objective knowledge and realistic principle.Furthermore,the prior knowledge assumptions are combined with the stacked architecture to refine the optical flow.In addition,to address the problem of the loss of feature details due to the limitation of the receptive field,this dissertation combines the dilated convolution and encoder-decoder architecture.Multiple dilated convolution layers are used to replace the standard convolution layers,so that the network can enlarge the receptive field and prevent the loss of motion details.By setting different intervals of hole,the network can change the size of the receptive field without adjusting the size of the feature map.This dissertation further introduces selective kernel mechanism into spatial pyramid architecture,which can adaptively adjust and select the size of the convolutional kernel at each pyramidal level.In the process of training,a large amount of labeled data is used to guide the network for supervised learning.The experimental results show that the proposed methods can improve the accuracy of optical flow estimation effectively.(2)Real application scenarios often contain rich object context information,and there is a strong correlation among different pixels.How to extract and use these object contexts is crucial for high-level visual understanding tasks.However,existing deep neural networks for 2.5D binocular scene flow estimation do not take into account the importance of texture and object contextual information for optical flow and disparity estimation.To address this problem,this dissertation proposes a novel network based on object contextual perception,which performs joint learning of stereo matching and optical flow under a unified framework.In order to extract the object context information of scene,this method integrates the object context-aware sub-network into the joint learning framework,which can adaptively obtain object contextual information from the global view according to the similarity between pixels.In the process of training,to address the problem of difficulty in obtaining ground truth,a large amount of unlabeled data is used to guide the network for unsupervised learning.The experimental results show that the extracted object contextual information can improve the accuracy of disparity and optical flow estimation effectively.(3)Existing 2.5D monocular scene flow estimation networks cannot adaptively enhance important features and suppress unimportant features based on global information,and lack the ability to discriminate the importance of features.To address this problem,this dissertation proposes a novel network for 2.5D monocular scene flow estimation based on dual attention mechanism.The dual attention mechanism describes the global dependencies of features along the channel and spatial and enhances the weight of important features.It improves the discriminative and representational ability of feature and the anti-interference ability in complex scenarios.In addition,this method combines the idea of motion decomposition with the dual attention mechanism and introduces the attention module into both the rigid motion estimator and non-rigid motion estimator to enhance the discriminative and representational ability of feature.In the process of training,to address the problem of difficulty in obtaining ground truth,a large amount of unlabeled data is used to guide the network for unsupervised learning.The experimental results show that using the dual attention mechanism can improve the accuracy of monocular depth and optical flow estimation effectively,and can preserve more motion boundaries.(4)Existing networks for 3D scene flow estimation always rely on max-pooling to aggregate features.However,max-pooling layers keep only the strongest activation on features across a local or global region,which may lose some useful detailed information.For the task of 3D scene flow estimation,the goal is to obtain the 3D motion vector of each point accurately,so the details of features are essential for recovering 3D motion.To address this problem,this dissertation proposes a network for 3D scene flow estimation based on point convolution,which contains a series of point cloud convolutional and deconvolutional layers.The point cloud convolutional layer is used for extracting and learning of point cloud feature.The point cloud deconvolutional layer is used for feature refinement and propagation.In the process of training,a large amount of labeled data is used to guide the network for supervised learning.The experimental results show that using point convolution and deconvolution layer can improve the accuracy of 3D scene flow estimation effectively,and the proposed method can outperform most of the state-of-the-art approaches on public datasets.
Keywords/Search Tags:Scene flow estimation, optical flow, selective kernel mechanism, object context, attention mechanism
PDF Full Text Request
Related items