Font Size: a A A

Research On Deep Learning Methods Under Limited Labels

Posted on:2024-04-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:S LiFull Text:PDF
GTID:1528307340454114Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning-based methods have made breakthrough progress in computer vision,and one of the keys to their success is that the training data has supervised labels.The acquisition of supervised labels will consume a lot of manpower and material resources,which has become an important factor hindering the development of computer vision.In the real world,it is very easy to obtain a large amount of unlabeled or weakly labeled data.With the deepening of research,how to effectively use these label-limited data has become a hot spot of current research.Since these data do not have complete supervision information,training deep learning methods with limited labels and achieving performance comparable to supervised methods is a difficult point,and more in-depth research and exploration are needed.This dissertation aims at visual tasks in computer vision,in the case of limited labels,to study how to use the existing large amount of unlabeled or weakly labeled data,and to narrow the performance gap between label-limited methods and supervised methods.The main research content and contributions of this dissertation are:(1)For the task of unsupervised representation learning for a large amount of unlabeled image data,inspired by the self-organizing map network,this dissertation proposes a self-supervised self-organizing clustering network to learn the visual representation of images.The network takes the weight of the self-organizing layer as the clustering center,quickly calculates the similarity between the feature and the clustering center through the self-organizing clustering head,converts the similarity into a selfsupervised pseudo-label,and then realizes the joint learning of feature extraction and feature clustering,so as to learn a better representation.It is worth mentioning that since the process of optimizing the self-organizing layer in this method is a clustering process,there is no step of explicitly calculating the cluster center,so there is no need to extract or store features of all images,which ultimately simplifies the processing steps and computation of deep clustering methods.(2)Aiming at the problem that a large number of labeled base class images are required in few-shot image classification,this dissertation proposes a few-shot image classification method for constructing tasks from unlabeled base class images.To effectively utilize the unlabeled image construction task,a deep clustering model is proposed to learn image features in a specified clustering space.The model first sets up a separable clustering space by fixing the clustering centers,then uses a deep network to extract image features,and finally learns the image features into the clustering space through the designed clustering head.In clustering,in order to successfully construct tasks,an image sampling and task construction strategy is proposed,and few-shot image classification is achieved through a designed few-shot learning head.Finally,the joint optimization of the clustering head and the few-shot learning head is realized in the form of multi-task learning by sharing the backbone network.Experimental results and visualizations on a range of datasets show that the method can strongly generalize from base classes to novel classes.(3)For the task of self-supervised representation learning on a large amount of unlabeled image data,this dissertation proposes a simple and effective self-supervised representation learning method based on information entropy theory.From the perspective of reducing information entropy,the method takes the minimum entropy as the goal,that is,the output vector of the projector is close to its nearest minimum entropy as the optimization goal.The core content of this method includes three important steps of normalizing along the batch dimension to avoid model collapse,computing the nearest minimum entropy to obtain the optimization target,and computing the symmetric loss,and backpropagating to optimize the network.Experimental results show that,without the need for techniques such as negative sample pairs,predictors,momentum encoders,and cross-correlation matrices,this method can learn effective representations and achieve better results with lower complexity.(4)Aiming at the high cost of obtaining pixel-level annotations in salient object detection,this dissertation proposes an unsupervised salient object detection method based on learning salient features,which breaks through the bottleneck that existing methods require additional models to introduce saliency information.This method divides image features into salient feature belonging to salient object and non-salient features not belonging to salient objects,uses enhancing salient features and suppressing nonsalient features as an optimization function,and finally realizes unsupervised salient object detection by learning salient features from the data itself.In this method,a salient object localization module is proposed to roughly localize the objects where the salient features are located to obtain an initial salient activation map.Since the objects in the initial saliency map are usually incomplete and contain a lot of noise,a saliency map update strategy is designed to gradually remove the noise and enhance the boundaries.Experimental results and saliency maps demonstrate that the method can effectively learn salient visual objects and successfully predict pixel-level saliency maps,narrowing the performance gap between unsupervised and supervised methods.(5)Aiming at the task of sparse representation of motion information in the video,this dissertation proposes an unsupervised video motion information sparse representation framework based on sketch flow and designs a scheme that combines the sparse representation with deep learning,taking unsupervised anomaly detection as the benchmark task.In this sparse representation framework,with sparse sketch lines as the basic unit,the motion information is transformed into the solution problem of obtaining motion sketch lines,and based on the adjacent temporal consistency and local spatial consistency,a sketch flow with sparse representation properties is proposed to model motion information in videos.In order to verify the effectiveness of the sketch flow,the sketch flow features representing the motion information and the image features representing the apparent information are extracted through the graph neural network and the convolutional neural network respectively,and then the two features are fused according to the location of the sketch flow to enrich the video features.Visualization and experiments show that sketch flow enables the sparse representation of motion information in videos,and adding sketch flow to baseline methods can further improve the detection accuracy of anomalous events.(6)Aiming at the problem that it is difficult to obtain frame-level annotation data in video anomaly detection tasks,this dissertation proposes a weakly supervised video anomaly detection method using video-level annotations as supervision signals.This method takes advantage of the continuity of abnormal events in the video,uses multiple continuous video clips as the optimization unit,designs a Transformer-based multi-sequence learning network,and constructs a hinge-based multi-sequence learning ranking loss function,which reduces the probability of wrong selection of anomalous segments in methods based on multi-instance learning.During the training process,a selftraining strategy is further proposed to gradually reduce the length of the sequence,so as to gradually refine the anomaly score and achieve frame-level prediction for video anomaly detection.The effectiveness of the method is verified on three public video anomaly detection datasets,and good anomaly detection accuracy is achieved,realizing accurate prediction of abnormal events.
Keywords/Search Tags:unsupervised learning, self-supervised learning, weakly supervised learning, deep clustering, deep learning, few-shot learning, salient object detection, video anomaly detection
PDF Full Text Request
Related items