Font Size: a A A

Deep Learning-based Visual Motion Estimation And Understanding

Posted on:2022-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:1488306332991939Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of deep learning,computer vision has achieved remarkable suc-cess in image tasks.In recent years,along with the explosive growth of the amount of video data and related business,several fields such as surveillance and security,autonomous driving,inter-active entertainment.and industrial vision have put forward new demands for video application However,directly transferring the deep learning methods designed for static images to video tasks will lose the information modeling in the temporal dimension and thus cannot handle video tasks well.Therefore,it is important to conduct an in-depth study of video tasks.This thesis focuses on the field of visual motion estimation and understanding,organizing the research on video-related tasks by a shallow to deep order of the degree of information pro-cessing.There are still many challenges for previous researches in both low-level and high level visual motion tasks.For the former,taking scene points or dense pixels as the research object,the biggest challenges are the difficulty of collecting annotation data for supervised learning and the unreliability of unsupervised objective functions.While the high-level visual motion understand-ing task takes object or action instances in the video as the research object,and the difficulty lies in how to achieve efficient and accurate inference accuracy with low computational cost.To address the above challenges,this thesis researches on the low-level tasks including unsupervised learning for depth estimation,optical flow estimation,and other dense-point state estimation tasks,as well as high-level tasks for multi-object tracking and action detection tasks.The research content and main innovations of this thesis are as follows1.For low-level visual tasks such as depth estimation,optical flow estimation,and motion region segmentation,this thesis proposes an unsupervised multi-task learning framework with geometric constraints,which allows for these tasks learning without labeled data.At the same time,this thesis proposes to use the classical optimization method to obtain the camera ego-motion and rigid flow from the depth and optical flow estimated by the network,and then to distinguish the motion regions in the scene according to the view synthesis error.By introducing customized consistency losses for the moving and stationary regions respectively,the prediction accuracy of all subtasks can be further improved2.For the problem of the objective in unsupervised optical flow estimation tasks is unreliable in complex scenes such as large motion,occlusion,and extreme lighting conditions,this thesis proposes a novel analog learning method.By designing various transformations to construct ana-log samples,the predictions of the original samples are used to provide more reliable supervision signals for the analog samples.In addition,this thesis designs a highly shared recurrent architec-ture for optical flow decoder and proposes a multi-frame extension,which not only significantly reduces the amount of parameters and calculations,but also achieves similar accuracy to main-stream supervised learning methods,with much better generalization performance..3.For the multi-object tracking task,this thesis proposes to combine the target detection,re-identification,and motion estimation subtasks into a single anchor free network with multi-task learning.At the same time,this thesis proposes a recurrent motion estimation branch and a chained memory inference strategy to achieve accurate motion estimation with low additional parameters and computation cost,thus reducing the dependence of multi-object tracking on complex associa-tion algorithms.In addition,this thesis proposes a method to train the tracking network with static images of the detection dataset,thus realizing an easy training multi-tracking method.4.For the application landing problem of visual motion research,this thesis conducts applied research on motion tracking and understanding tasks in industrial assembly process,and proposes a data-driven visual motion tracking and action understanding system to achieve real-time spatio-temporal detection of hand movements in the assembly process.At the algorithmic aspect,this thesis proposes an algorithm for frame-by-frame multi-target tracking with a cross-frame detection approach,fusing the prediction results of high-frequency trackers and low-frequency detectors to achieve efficient object tracking with low computational overhead.In addition,this thesis pro-poses to take a light weight modification for the image object detection and video action detection networks in a generic way respectively,which further motivates the whole video analysis system to run in real time on a low computing power hardware platform.For the above-mentioned research content,this thesis achieves excellent research progress in the field of visual motion estimation and understanding,and has been widely used in academia and'industry,which complements the frontier research on video motion.
Keywords/Search Tags:Computer Vision, Deep Learning, Motion Estimation, Multiple Object Tracking
PDF Full Text Request
Related items