Font Size: a A A

A Study On Convolutional Neural Networks For Person Re-identification And Visual Tracking

Posted on:2018-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z P HeFull Text:PDF
GTID:2428330542493435Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In recent years,due to the growing demands for video surveillance such as intelligent mon-itoring,intelligent transportation,human-computer interaction,motion analysis and other applications,person re-identification(Re-id)and visual tracking have become a very pop-ular research area.In video surveillance,person Re-id is a task of recognizing people that have already been captured over a network of non-overlapping cameras.It is a novel re-search topic with many challenges,such as low-resolution,illumination variations,view-point changes,etc.Visual tracking is a challenging research topic in the field of computer vision.The task is to determine the position and motion information of the interested target in a video sequence.It provides a foundation for further semantic analysis.Recently,driven by the emergence of large-scale visual datasets and fast development of computing power,deep learning,especially convolutional neural networks(CNN),with their strong capabili-ties of learning feature representations have demonstrated a record breaking performance in many computer vision tasks,e.g.image classification,object detection,semantic segmenta-tion,etc.In this thesis,we perform an in-depth study on person Re-id and visual tracking,and apply deep learning to them.The main research contents are as follows:1.A deep feature embedding method based on lifted structured loss is proposed for person Re-id.Recently,deep CNN with a triplet loss become a popular framework for person Re-id.However,it has been reported that triplet loss based framework can not make full use of the batch information.The recently proposed lifted structured loss has shown excellent performance in applications such as image retrieval,which overcomes the shortcomings of triplet loss.However,the lifted structured loss does not take into account the impact of changes in samples distribution,making training be fluctuant.To solve this problem,we propose a novel structured loss based on lifted structured loss,which eliminates the influence of sample distribution on training.We verify that the proposed structured loss is superior to naive contrastive loss and triplet loss in both accuracy and training speed.Moreover,we combine the proposed structured loss and identification loss that have complementarity properties.Extensive experiments on datasets CUHK03,CUHK01 and VIPeR demonstrate the superior performance of our proposed method.2.A fast Fourier transform network,named FFTNet,for visual tracking is proposed.Cor-relation filter(CF)based trackers have recently gained a lot of popularity in visual tracking area due to their impressive performance and efficient computation(more than 100 FPS).However,using hand-crafted features and only the video itself as training data limit the gen-eralization of CF.Thus,some researchers focus on the incorporation of stronger features for a richer representation of the target.Recently,several attempts have been made to use the internal representation of CNN as features within CF framework,which shows the superior performance.However,these methods do not take full advantage of the benefits of end-to-end learning because of the separation between CNN feature extraction and other tracking modules.Other researchers proposed to use a pre-trained deep CNN on the large-scale dataset and fine-tune multiple layers of the network in the tracking domain.Although these methods achieve state-of-the-art results,they also suffer from high computational loads and can not operate in real-time(usually less than 10 FPS).The proposed FFTNet is a CNN-based tracker that integrates two main components of CF,i.e.auto correlation and cross cor-relation between the features of two images.Moreover,FFTNet combines the advantage of CF with CNN so that it has strong capabilities of learning feature representation and match-ing function.FFTNet is trained end-to-end on the PASCAL VOC2012 and ALOV++ in an off-line manner.Experimental results on the visual tracking benchmark OTB50 demonstrate the superior performance and computational efficiency(more than 49 FPS)of the proposed method.
Keywords/Search Tags:Person Re-identification, Visual Tracking, Deep Learning, Convolutional Neural Networks, Correlation Filter
PDF Full Text Request
Related items