Font Size: a A A

Object Detection,Tracking And Recognition For Visual Perception And Understanding

Posted on:2019-12-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:L M ZhaoFull Text:PDF
GTID:1368330548977393Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,visual data such as video or images has grown exponentially.In the context of big data,how to extract the semantic information from visual data has become one of the essential tasks in computer vision.However,due to the unstructured nature of visual data and the existence of redundancy in background,the research on visual object perception and representation is challenging and of great significance in today's era of data explosion.This thesis focuses on the key technologies of semantic object perception and repre-sentation in visual data.First,object detection and tracking are needed to extract the object of interest in the video or image.Second,recognizing the extracted object to complete the cognitive process.Last,learning structured representation for semantic object aiming at individual object recognition.In this way,we achieve the goal of visual semantic under-standing.The main contributions of this thesis are as follows:1.To address the problem of the detecting object of interest,we propose a multi-task deep saliency model based on a fully convolutional neural network.To understand the semantic content in the visual data,we need to quickly and accurately locate the semantic objects of interest,this task is also called salient object detection.A key problem in salient object detection is how to effectively model the semantic properties of salient objects in a data-driven manner.In this thesis,we propose a multi-task deep saliency model,which takes a data-driven strategy for encoding the underlying saliency prior information,and then sets up a multi-task learning scheme for exploring the intrinsic correlations between saliency detection and semantic image segmentation.Experimental results demonstrate the effectiveness of our approach in comparison with the state-of-the-art approaches.2.For the research on spatial-temporal object tracking,we propose a robust keypoint tracker based on spatio-temporal multi-task structured output optimization driven by dis-criminative metric learning.To achieve the goal of effective object tracking,we jointly optimize the problem in a spatio-temporal multi-task learning scheme.Furthermore,we incorporate this joint learning scheme into both single-object and multi-object tracking sce-narios,resulting in robust tracking results.Experiments over several challenging datasets have justified the effectiveness of our single-object and multi-object trackers against the state-of-the-art.3.Aiming at developing a more powerful object recognition model,we present a deep neural network with a modularized building block,merge-and-run block,which as-sembles residual branches in parallel through a merge-and-run mapping.We show that the merge-and-run mapping can improve information flow and make training easy.In comparison with other networks,our networks enjoy compelling advantages:they contain much shorter paths and the width,i.e.,the number of channels,is increased,and the time complexity remains unchanged.We evaluate the performance on the standard recognition tasks and the proposed approach achieves competitive results.4.For the task of individual object representation and recognition,we take the prob-lem of person re-identification as a typical case,which refers to associating the persons captured from different cameras.We propose a simple yet effective human part-aligned representation for handling the body part misalignment problem.Our approach decom-poses the human body into regions(parts)which are discriminative for person matching,accordingly computes the representations over the regions,and aggregates the similarities as the overall matching score.Unlike most existing deep learning algorithms that learn a global or spatial partition-based local representation,our approach performs human body partition,and thus is more robust to pose changes and various human spatial distributions in the person bounding box.Our approach shows favorable results on person searching with the structural person representation.
Keywords/Search Tags:Object perception, object detection, object tracking, object recognition, deep learning
PDF Full Text Request
Related items