Font Size: a A A

Research On Learning Adaptive Ranking Functions And Deep Features For Person Search

Posted on:2020-07-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1368330575956941Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Recent years have witnessed the rapid development of person search,which has become a key technology in intelligent video surveillance,and is playing a more and more important role in video investigation,human tracking,and behaviour analysis.Person search aims to find a specific pedestrian in surveillance images or videos with computer vision technologies.Early person search mainly addresses the problem of cross-view person re-identification,while more tasks such as video based person search,person search in the wild,and person search with natural language queries,have emerged with the progress of technical research and the expansion of practical application.Traditional research techniques focus on extracting hand-crafted features and learning discriminative metrics,while nowadays researchers are devoted to learn powerful features from data with neural networks.This dissertation reviews different development stages of person search,and carries out research on how to learn more optimal ranking functions and powerful features.The contributions of this dissertation are listed as follows:(1)An approach based on sample-specific SVM learning and Least Square Semi-Coupled Dictionary Learning(LSSCDL)for person re-identification is proposed.Most of the existing methods focus on learning a fixed ranking function to measure the similarity between all images,and did not give sufficient consideration to the individuality of each pedestrian.This dissertation formulates person re-identification as an imbalanced classification problem and learns a classifier specifically for each pedestrian such that the matching model is highly tuned to the individual's appearance,which can provide more discriminative measurements for finding the best candidate.This dissertation proposes the LSSCDL algorithm to learn a feature dictionary,a ranking function dictionary,and a mapping function simultaneously,through which the weight parameters of a new sample can be easily inferred by its feature patterns.Compared with traditional dictionary learning methods using l1-norm for regularization,the proposed LSSCDL algorithm employs l2-norm which has much higher efficiency and can better handle the large feature dimension and variations in person re-identification.(2)A Deep Mutual Learning(DML)strategy for person re-identification is presented.To better balance the efficiency and accuracy for deep neural networks,this dissertation proposes a simple but effective way to improve the generalisation ability of a network by training collabo-ratively with a cohort of other networks.Specifically,each network is trained with two losses:a conventional supervision loss,which measures the difference between predictions and true la-bels,and a interaction loss that aligns each network's class posterior with the class probabilities of other networks.Trained in this way,it turns out that each network does not only learn how to correctly distinguish different samples,but also learn from the training experience of other net-works to improve the generalization ability.This dissertation extends DML to more networks in the student cohort,where the search performance is enhanced when learning together with increasing numbers of peers.The proposed DML extends straightforwardly to semi-supervised learning,where the performance can be improved by exploiting the unlabelled data using the interaction loss.Finally,this dissertation attempts to give some insights about how and why the DML strategy works,and the validation experiments show that DML leads to better quality solu-tions with more robust minima,which is expected to provide better generalization performance.(3)A cross-modal projection learning algorithm for person search is proposed.To deal with the problem of retrieving person given natural language queries,this dissertation proposes a Cross-Modal Projection Matching(CMPM)loss and a Cross-Modal Projection Classification(CMPC)loss for learning discriminative image-text embeddings.The CMPM loss converts scalar projections between vectors as the matching probabilities and minimizes the KL diver-gence between the projection compatibility distributions and the normalized matching distri-butions.Compared with the canonical correlation analysis and bi-directional ranking loss,the CMPM loss does not need to select specific triplets or tune the margin parameter,and exhibits great stability with various batch sizes.For the assistant classification task with identity labels,this dissertation proposes the CMPC loss which integrates cross-modal projection into the norm-softmax loss and attempts to classify the vector proj ection of the features from one modality onto the matched features from another modality.The CMPC loss further increases the separability between different classes and the compactness between matched embeddings.
Keywords/Search Tags:Person search, Person re-identification, Sample-specific SVM, Semi-coupled dic-tionary learning, Deep mutual learning, Feature representation learning, Cross-modal retrieval
PDF Full Text Request
Related items