Font Size: a A A

Anomaly Detection And Cross-Modality Person Re-identification In Intelligent Surveillance

Posted on:2022-01-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y HaoFull Text:PDF
GTID:1488306602992589Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
Video surveillance is important auxiliary information in public safety.With the construction of a national safe city,a large-scale surveillance camera network has been gradually established.How to intelligently analyze the large amount of data captured by the camera network to achieve key tasks such as security surveillance deployment and person tracking in the construction of smart cities has important value and significance for the construction and development of safe cities.The city-level camera networks have the characteristics of wide spatial distribution,and the definition of abnormal events in different scenarios is not the same.Therefore,it is necessary to propose a general anomaly detection algorithm for different scenarios to detect unexpected emergencies in the monitoring scene.Anomaly detection algorithms can make alerts to provide a decision-making reference for the intelligent video surveillance system to facilitate downstream pedestrian detection and pedestrian reidentification tasks.In addition,surveillance cameras in cities have different working hours and different lighting conditions in the working environment.Therefore,the cameras will collect images of different modalities,such as infrared images in low light or night and RGB images during the day or in good light.Therefore,it is necessary to propose a cross-modality person re-identification method for cross-modality surveillance videos in different scenarios to achieve cross-device multi-scene retrieval of target pedestrians,thereby providing reliable assistance for target pedestrian trajectory analysis.Anomaly detection aims at detecting a small number of unknown abnormal events that are different from normal events in the monitoring scene,and these abnormal events also have diversity among different scenes.The key challenge lies in how to use only normal video data to design an effective model to detect unknown abnormal events.And the cross-modality person re-identification aims at how to separate and process the common information(such as pedestrian identity information,human body structure information)and individual information(such as pedestrian modal information,Pedestrian texture information).The difficulty lies in how to decouple common information that has nothing to do with personality information from image information,that is,to extract modality-invariant features while the features are discriminative.The rise and development of deep learning have injected new vitality into the field of abnormal event detection and cross-modality person re-identification.In particular,deep learning methods have made remarkable achievements in the field of image reconstruction and image classification.Based on deep learning theory,this dissertation studies the problem of anomaly detection and cross-modality person re-identification in intelligent video surveillance and proposes a series of related methods.The main research results obtained in this dissertation are summarized as follows:1.An anomaly detection method based on spatio-temporal consistency enhancement network is proposed,which improves the disturbance of abnormal events in the video to the output of the model,and enhances the effective prediction of abnormal events in future frames.Due to the sporadic characteristics of abnormal events in video surveillance,it is usually difficult to obtain a large amount of video data of abnormal events for supervised training in machine learning.Therefore,the existing video anomaly detection methods mainly adopt unsupervised learning,and only use the video images of normal events to train the designed deep autoencoder,and reduce the generalization ability of the model for abnormal events.In this way,the video image of the event is difficult to fit,so it is judged whether the input image is the video image of the abnormal event according to whether the reconstruction result after the image is input from the encoder is an outlier.However,this reconstruction method based on the self-encoder only considers the spatial information of the image and ignores the important aspect of the temporal information of the video data.To solve this problem,this dissertation proposes a video future frame prediction framework based on a deep convolutional network,which extracts the spatiotemporal mixed information of the input video sequence through a three-dimensional convolution operation to generate the future frame image of the input video sequence.In addition,a spatio-temporal consistency discriminator is designed to enhance the spatio-temporal consistency between the generated video future frame image and the input video sequence.By combining the above-mentioned video future frame prediction framework and spatio-temporal consistency discriminator,the disturbance of the model output by abnormal events in the video can be effectively increased,thereby enhancing the effective detection of abnormal events in future frames.The method proposed in this dissertation conducts quantitative experiments on four public databases for abnormal event detection.The experimental results show that the method can significantly improve the accuracy of detection compared with the existing anomaly detection algorithms.2.A cross-modality person re-identification method based on hypersphere manifold embedding is proposed,and an effective similarity criterion is constructed,which improves the recognition accuracy and retrieval accuracy of cross-modality person re-identification.A key challenge in cross-modality person re-identification is how to extract high-dimensional representations of pedestrian images of two different modalities in the same feature space.Existing methods combine identity constraints and measurement constraints through multitask learning,and simultaneously optimize tasks to achieve discriminative feature extraction of pedestrian images.Because the dimensions of the two constraints are not uniform,the identity constraint and the measurement constraint are prone to fail to achieve the optimal solution at the same time,which affects the discriminability of the feature.In response to this problem,it is proposed to map pedestrian image features to a hypersphere manifold in a high-dimensional feature space,and use the angle between the features on the hypersphere manifold as the basis for classification and measurement,so as to unify the two constraints.It also proposes a two-way sorting loss to improve the discriminative and robustness of features.In addition,a feature decorrelation method based on singular value decomposition is introduced to further increase the inter-class distance of cross-modal pedestrian image features,so that pedestrian image features are more suitable for recognition problems.The method proposed in this dissertation has been experimentally verified on two public data sets.The experimental results show that compared with the most advanced cross-modality person re-identification method,the method has achieved a significant recognition rate and retrieval rate.The promotion.3.A cross-modality person feature registration method based on dual-alignment feature embedding is proposed,which provides an effective solution to the modal and spatial misalignment problems of cross-modality person features.In recent years,the methods for cross-modality person re-identification problems have mainly focused on solving the problem of modal differences.However,due to the difference in pedestrian posture and camera angle of view,changes in pedestrian posture and pedestrian position in the image will also affect the discriminability of pedestrian images Feature extraction.In order to solve the problem of pedestrian image spatial misalignment caused by pedestrian posture and camera perspective,a method of fine-grained feature extraction of pedestrian images based on the idea of image segmentation is proposed.In addition,in order to solve the problem of modal misalignment of pedestrian image features,two loss functions are designed to constrain the modal consistency of pedestrian image features: 1)Proposing an intra-class distribution loss function from the perspective of the high-dimensional spatial distribution of features;2)Proposing an inter-class correlation loss function from the perspective of feature correlation.By using fine-grained features and modal consistency constraints at the same time,the modal misalignment and spatial misalignment problems of cross-modality pedestrian features can be effectively solved.The method proposed in this dissertation has been experimentally verified on the two public data sets.The experimental results show that the proposed method can effectively make the cross-modality pedestrian image features achieve a dual-alignment state in space and modality,and achieve the alignment of cross-modality pedestrian image features.4.A pedestrian re-identification method based on modality adversarial strategy is proposed,which effectively enhances the invariance of pedestrian characteristic modalities and improves the re-identification performance.The inter-relationship of cross-modality pedestrian images in the feature space is not only affected by the appearance information but also easily affected by modality changes.The existing methods reduce the modal difference by mapping the two modalities into the same feature space,but they lack the basis for measuring the effect of feature mapping.Aiming at this problem,a modality discriminator based on deep convolutional network is designed to measure the modal consistency of the mapped features.In addition,a modality adversarial strategy is proposed to train the feature extractor and the modality discriminator at the same time.Through the adversarial training of the modality discriminator and the feature extractor,the modality invariance of cross-modality pedestrian image features can be effectively enhanced.The method proposed in this dissertation uses two different feature extractors on the two public data sets respectively for experimental.The experimental results show that the method can not only constrain the modality mapping process and improve the recognition effect but also universal for different feature extractor structures.
Keywords/Search Tags:Anomaly Detection, Cross-modality Person Re-identification, Deep Convolutional Network, Modality Adversarial Strategy, Hypersphere Manifold Embedding
PDF Full Text Request
Related items