| With the rapid development of modern science and technology,large-scale camera system has become possible.In particular,the new generations of various mobile cameras,such as smart-phone cameras and GoPro has brought us the fast development of camera networks.These new techniques enable more effective and efficient understanding and analysis of the characteristics of and relationships among multiple targets,especially pedestrians,in various kinds of scenes.In many vision applications,identifying the same persons across different views plays an important role.However,traditional approaches often use a single or two cameras to obtain videos,and in most cases,cameras are fixed,which have limited field of view.The emergence of mobile cameras makes up for this limitation.In this thesis,we use multiple mobile cameras to study the important problem of identifying the same target on multi-view images that are taken by different cameras at the same time-we denote this problem to be Multi-view Multihuman Association(MvMhA).Different from the previous research work on human association across two views,this thesis focuses on more general and more challenging cases with multiple(two or more)views and non-fixed views.In addition,each involved person in the scene can appear in all the views,or only in a subset of views,which was not priorly known.To address these challenges,we develop a new framework based on end-to-end deep network and conduct in-depth research by developing two methods based on Recurrent Neural Network(RNN)and Graph Neural Network(GNN),respectively:(1)The RNN-based method first uses an appearance-based deep network to extract the features of each object on each image.A comprehensive affinity matrix is constructed by calculating the pairwise similarity scores between all the detected targets.Finally,we propose a Deep Assignment Network(DAN)to transform the affinity matrix into an assignment matrix,which provides a binary assignment result for MvMhA.(2)The GNN-based method also first uses the feature extraction network to extract the features of each object under each view,and then uses the obtained features as the node features of the graph neural network to construct a multi-view and multi-object graph matching network model.In addition,we build both a synthetic matrix dataset,a synthetic image dataset and a real image dataset to verify the effectiveness of the proposed method.We also test the trained network on other three public datasets,resulting in very good cross-domain performance. |