Font Size: a A A

Deep Learning Driven Scene Analysis And Semantic Target Analysis

Posted on:2018-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:S S ZhaoFull Text:PDF
GTID:2358330512999461Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Semantic object parsing and scene analysis are important research topics in com-puter vision.They aim at analyzing and understanding object and scene respectively in digital images or videos and have been widely used in intelligent video surveillance,autonomous driving,intelligent transportation,etc.Semantic object parsing is the process of detecting,recognizing and analyzing ob-jects,including human,car and so on.Among them,human parsing,namely decom-posing a human image into semantic fashion/body regions and analyzing the attributes,serves as the basis of many computer vision applications.Scene analysis mainly includes estimating the depth of secne,analysing the motion and structure.Scene's depth estimation aims to obtain the depth information from the corresponding image(s),and is helpful to recovering the three-dimensional structure.Motion analysis of scene is to obtain optical flow information from consecutive frames,and can be used for action recognition of moving target and abnormal event detection.So efficient algorithms of human parsing,depth estimation and optical flow estimation have crucial pratical significance.And in this thesis,we focus on these three tasks.In recent years,deep learning has made breakthroughs in a variety of computer vision tasks,e.g.object detection,face recognition and scene labeling.Designing task-specific network model has been paid more and more attention to by both academic com-munity and industrial community.In this thesis,we propose different deep models for human parsing,depth estimation and optical flow estimation respectively.Specifically,1.For depth estimation task,existing related methods are reviewed firstly.To address the problems of existing methods based on deep learning in modeling the spatial context,we propose data-driven contextual feature learning model and loss func-tion based on TV norm respectively.The contextual model predicts the depth value by fusing the contextual features using the location-dependent weights,which are learnt from data.The TV norm based loss function can restrain noises and pre-serve edge and make the depth map more smooth.Finally,we demonstrate that fusing the two models leads to a more effective method.2.In the optical flow estimation task,in contrast to the traditional algorithms,the deep learning based methods are more efficient and easy to expand.However,there are not much works based on deep learning,and the existing deep mod-els often fail in the situation of large displacements.In order to deal with the large displacement problem,we propose a deep model framework based on multi-scale correlation learning,which can handle the large displacements efficiently.In some datasets including large displacements,in comparison to the benchmark,the proposed framework improves the performance signaficantly.Moreover,s-ince the predicted flow map contains some noises and errors,a model consisting of recurrent neural networks(RNNs)and convolutional neural networks(CNNs)is designed,which can refine the predicted maps and achieve a more flavorable result.3.For human parsing task,in the competition of refined human recognition in surveil-lance video,we propose two deep models based on Faster R-CNN.One of them detects and classifies the parts of human jointly,while the other divides the unified framework to two steps:we firstly detect the parts and then recognize the attributes of every parts.Experimental results show that the divided frameworks are robust to the interference among classes and then can reduce the misclassification.
Keywords/Search Tags:deep learning, CNNs, depth estimation, optical flow estimation, human parsing, TV norm, multi-scale correlation learning
PDF Full Text Request
Related items