Font Size: a A A

Research On Learning Semantic Projection And Deep Features For Person Re-identification

Posted on:2021-06-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J DaiFull Text:PDF
GTID:1488306032997659Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,under the tide of building safe city and intelligent security,person re-identification(ReID)as a key technology of intelligent video analysis,has developed rapidly.It has wide application prospects and important commercial values in video detection,intelli-gent security and intelligent commerce.ReID is built on cross views,which aims to search the matched pedestrians matching a giver query.Early ReID is mainly based on static images,while more tasks such as video Re-ID,person search in the wild,person search based on cross-modalities,have emerged with the development of technology and the demand of application.ReID is a specially challenging task because of significant lighting variations,pose differences and interference occlusions among different camera views.Moreover,the inaccuracy of pedes-trian detection further increases the difficulty.To address these challenges,the key issues to im-prove ReID performance are to design robust features,learn appropriate transformations,train discriminative metrics or automatically mine powerful features from data with neural networks.This dissertation reviews different development states of ReID and carries out research on how to learn optimal feature transformations and discriminative feature representation with deep networks.The main contents and contributions of this dissertation are listed as follows:(1)An algorithm based on Cross-view Semantic Projection Learning algorithm(CSPL)is proposed for image ReID.Most traditional ReID algorithms focus on designing hand-crafted feature and learning discriminative metrics.However,the importance of feature transformation is neglected.This dissertation proposes the CSPL algorithm to conduct feature transformation.Specifically,the algorithm assumes that there exists a shared semantic basis for capturing the intrinsic structure of different views,and the semantic representations of the matched persons should have a stable association function.Thus,the algorithm can learn the semantic represen-tation of hand-crafted features,establishes the optimal association mapping across views and infers the view-specific semantic projection matrix simultaneously.In testing phase,the seman-tic representation of an instance can be obtained by multiplying the corresponding semantic projection matrix.The dissertation also extends the CSPL to Multi-view Semantic Projection Learning(MSPL).It is found that by employing more correlation information among multiple views,the latent semantics can be better learned to improve the matching performance.Exper-imental results demonstrate that learning the semantic representation of hand-crafted features can significantly enhance the representation power of original features and improve the match-ing accuracy.It is also can effectively deal with cross perspectives.(2)An algorithm based temporal residual leaning is proposed for video ReID.To smoothly make pedestrians alignment and effectively utilize temporal information in the video sequence,this dissertation proposes the Spatio-Temporal Transformation Network(ST2N)and the Tem-poral Residual Learning(TRL).The whole framework is based on the Convolutional Neural Networks and Recurrent Neural Networks(CNNs-RNNs)model.Specifically,the CNNs part of the model contains a ST2N module.It can use the temporal context knowledge from oth-er frames to predict the current frame spatial transformation parameters to align the pedestrian sequence.The RNNs part equips with two bidirectional recurrent units.They are respectively used to extract the general features and the characteristic features of video sequences.The added results of two complementary features are used as the enhanced video representation.Experi-mental results demonstrate that the proposed ST2N module can achieve smooth alignment of pedestrians in video sequence by using the temporal context information.The general features and characteristic features extracted by the TRL module can describe pedestrians from different aspects and provide informative complementary knowledge.(3)An algorithm based on Dynamic Imposter based Online Instance Matching(DI-OIM)loss is proposed for person search.To tackle the unavailable bounding boxes of pedestrians,limited samples for each labeled identity and large amount of unlabeled persons in scene im-ages,the dissertation proposes an end-to-end algorithm based the non-parametric DI-OIM loss.Specifically,the algorithm jointly optimizes pedestrian detection and person re-identification.To train the person recognition part,the algorithm formulates the DI-OIM loss to utilize the unlabeled persons.Based on the observation that the pedestrians appearing in the same image must have different identities,DI-OIM loss assign pseudo-labels for the unlabeled persons.The pseudo-labeled and the labeled pedestrians can be used together to optimize the pedestrian clas-sification.Experimental results demonstrate that the performance of both tasks can be signifi-cantly improved by simultaneously optimizing pedestrian detection and person re-identification.Compared with the traditional classifier,the nonparametric DI-OIM loss can directly optimize features and learn better features.Besides,the proposed loss demands the smallest memory space,but obtains the best search performance in contrast to algorithms which also utilize the unlabeled persons.
Keywords/Search Tags:Person re-identification, Semantic projection, Temporal residual, Spatial-temporal transformation, Person search
PDF Full Text Request
Related items