| To solve the increased challenges in the safety situation,China has invested a lot of resources to build the video surveillance network.With the popularization of the video surveillance system,Chinese security industry has ushered in the era of big data,video investigation technology has become key manner of cracking cases for policemen.In the actual video investigation,the investigator need to watch the vast amount of surveillance videos from cameras to find the target’s active images and trajectory,so as to quickly capture,check and track the suspected target.However,the previous video investigation aimed at searching for suspects mainly through the manual watching and judgement,which requires a lot of people and time,and easily missed the best time to crack the case.Therefore,it cannot meet the requirements of modern criminal investigation,thus promote the development of object tracking for video investigation.Object tracking refers to the technique of using computer vision and machine learning to estimate the trajectory of an interested target in a video,which can effectively help investigators to track suspected targets from a huge amount of videos,which improving the efficiency and logging rate of public security organs.Hence,object tracking has an important research significance and application value.Over the past decade,object tracking has been a research hotspot,which has achieved an excellent performance in many public dataset.However,in the surveillance scene of video investigation,the factors such as occlusion,camera view and illumination changes,make the technology become very challenged.In this dissertation,to solve the four problems of object tracking in video investigation,e.g.,partial occlusion,contextual distractors,identity switching,and cross-camera discrepancy,we construct a novel research system for object tracking with hierarchically increased perception view,and carry out the corresponding studies on perception of object structure,perception of object and context,perception of object’s temporal motion,and perception of object’s identity across cameras,which solves the great technical bottleneck in video investigation.They are as follows:(i)Object observation is not comprehensive.Partial occlusion is common,the tracking methods that construct construct observation models only the holistic target,which cannot perceive the structural information of object and cannot perform well in the scene with partial occlusions;(ii)Object representation is not robust.The context usually contains lots of distractors with similar appearance to the target,the tracking methods that only adopt the object region to perform feature learning,which cannot perceive the difference between the target and the context,resulting in tracking failure;(iii)Object identification is suspect.There are various motion pattern for the object,e.g.,gone and reoccur,overlapping,turn-back.The tracking methods that construct identification models with fixed distance metric cannot perceive the temporal motion of object,which easily results in identity switching;(iv)Object association is not correct.In surveillance,the objects probably occur in multiple cameras,the association models that only adopt the characteristic in single camera cannot handle the cross-camera characteristic of objects well,resulting in wrong matching results.To this end,this dissertation focuses on the key technologies of object tracking in video investigation,which achieves a set of innovative results as follows:(1)Object observation based on structural correlation filtersMost of the existing trackers tend to model the object as a single whole,which cannot handle partial occlusions well.To this end,this dissertation proposes a novel method of object observation based on structural correlation filter,which aims to encode the global and local information of the target using correlation filters.During tracking,the visible parts can provide reliable cues for tracking.Besides,we construct a trianglestructure model to measure the spatial relationship among parts.They are employed to realize tracking synergistically,so as to solve the problem of partial occlusion in video investigation.In experiments,the proposed method achieves 82.5% and 75.1% precision on OTB50 dataset and OTB100 dataset,outperforming the state-of-the-art methods by 2% and 5.6% respectively.(2)Object representation based on context-perceptive Siamese networkMost of the existing tracking works mainly focus on the self-characteristic of the target region for feature learning,which cannot fully exploit the contextual information around the target.When the context contains distractors with similar appearance,these trackers fail to track the target accurately.To this end,this dissertation proposes a novel method of object representation based on a context-perceptive Siamese network.Concretely,we construct a target-versus-context awareness model,which aims to distinguish the target pixels from the contextual background and enhance the anti-interference ability of the tracker in complex scenarios.Furthermore,we integrate correlation filter into Siamese network to realize an end-to-end learning and testing.In experiments,the proposed method achieves 66.0% AUC score on OTB100 and 26%EAO score on VOT2017,outperforming the state-of-the-art methods by 6.6% and 2%respectively.(3)Object identification based on channel-attention ovonic insight networkMost of the existing tracking methods realize object identification using the fixed distance metric,which cannot perceive the evolution rule of object’s temporal motion.This would result in identity switching when facing the complicated motion pattern.To this end,this dissertation proposes a novel method of object identification with a channel-attention ovonic insight network,which integrates feature learning,similarity measurement,and identity assignment into an end-to-end network.Concretely,we add a channel-attention branch to generate more robust feature representation.Then,we propose an ovonic insight network to realize similarity measurement and ID assignment adaptively,and can handle the typical cases of object leaving and object entering.In experiments,the proposed method achieves 38.9% MOTA score on MOT2015 and 52.7%MOTA score on MOT2017,outperforming the state-of-the-art methods by 7.2% and2% respectively.(4)Cross-camera object association with discrepancy descriptionThe existing methods for cross-camera object association usually focus on the selfcharacteristic of object images in single camera,which cannot effectively handle the cross-camera discrepancy.To this end,this dissertation proposes a novel method of cross-camera object association based on discrepancy description.By investigating the cross-camera variation of object’s characteristic,we aim to excavate the object data with cross-domain consistency.Thus,we construct a discrepancy descriptor by employing the discrepancies from other trajectories to represent the object sequence.The discrepancy descriptor can handle the cross-camera variation and improve the matching performance a lot,which provides a new insight for the object representation in cross-camera tasks.In experiments,the proposed method achieves 86.6%,71.2% and 72.5% Rank1 score on PRID2011,i LIDS-VID and MARS dataset respectively,outperforming the state-ofthe-art methods by 2%,5.8% and 2.8% respectively.In conclusion,starting from the essence of object tracking,this dissertation regards it as a process of constantly perceiving the target on the video sequence,and constructs a novel research system for object tracking with hierarchically increased perception view.It solves the great technical bottleneck in video investigation like the incomplete object observation,non-robust object representation,suspect object identification,and incorrect object association.The theoretical achievements on the studies related to the perception of object structure,perception of object and context,perception of object’s temporal motion,and perception of object’s identity across cameras will provide a new manner to object tracking in practical video investigation on the theory and key technologies. |