Deep long-tail recognition is one of the most challenging problems in the field of computer vision recognition,aiming to train well-performing deep models from a large number of images that follow a long-tailed class distribution.In the past decade,deep learning has emerged as a powerful recognition model for learning high-quality image representations and has made significant breakthroughs in general visual recognition.However,long-tailed class imbalance is a common problem in practical visual recognition tasks,which can easily lead to poor performance in tail classes and limit the practicality of deep network-based recognition models.Current research mainly focuses on optimizing three aspects: class-based re-balancing,information augmentation,and neural network module improvement.This dissertation conducts a survey on the research status of the above directions at home and abroad.In view of the shortcomings of the current research,this dissertation aims to design an efficient long-tail recognition solution and promote its development in practical applications.This dissertation focuses on the long-tail recognition problem,explores how to design reasonable metric indicators to optimize the imbalanced representation space,studies how to use meta-learning to solve the problem of prototype deviation of tail categories caused by imbalanced long-tail data distributions,which further leads to the collapse of representation quality,considers how to use the generation ability of diffusion models to alleviate the scarcity of tail classes in the dataset from the perspective of data augmentation,and how to use the representation ability of visual language representation frameworks to solve the long-tail problem.The research content of this dissertation mainly includes the following four methods:· To address the lack of metrics in the current representation learning for long-tail recognition to quantitatively evaluate to what extent a classifier or feature space can achieve the assumed ”balanced” classifier or ”discriminative” feature space,we propose a novel metric,the distribution overlap coefficient,to assess the quality of the representation of the feature space and classifier weights.Additionally,we develop a v MF classifier and class-wise and feature-class consistency losses based on the distribution overlap coefficient to optimize the training process,reducing interference between classifier weights and ensuring consistency between the distributions of features and classifier weights.Finally,we design a zero-cost post-training calibration algorithm based on the distribution overlap coefficient to alleviate the dominance of head categories in classification decision during inference.Our model has achieved significantly better performance than previous works in long-tail image classification,semantic segmentation and instance segmentation tasks,and has reached a competitive level.· We propose a decoupling training scheme based on meta-learning feature prototypes to address the issue of performance degradation in long-tail recognition when some categories have only a small number of samples.In this situation,the training sample distribution may not represent the true data distribution,which can lead to prototype shift in tail categories.Firstly,in the first-stage feature learning process,we use meta-learning to determine the feature prototypes(meta-learned feature prototypes),which improves the quality of supervised contrastive learning representation.Secondly,based on the learned meta-learned feature prototypes,we design a tail category feature-level data augmentation algorithm,which effectively corrects the data distribution during tail category data sampling in conventional methods,and provides accurate feature-enhanced samples for tail categories.We compare our approach with multiple advanced methods and classical solutions,demonstrating the effectiveness and superiority of our method.· In response to the fact that some simple applications of existing data augmentation methods in long-tail recognition cannot effectively solve the problem of data scarcity,we propose a novel data augmentation method for long-tail recognition called Diff RC based on diffusion models.This effectively alleviates the problem of imbalanced training samples for long-tail datasets.Diffusion models can generate semantically diverse augmented samples based on the characteristics of long-tail problems to address the scarcity of tail data.In the diffusion training stage,we use paired diversity loss to enhance the diversity of generated tail samples.In the diffusion sampling stage,we use motified prototype contrastive loss to guide the sampling of diffusion models,replacing the inaccurate decision boundary signals of traditional classifiers in long-tail datasets,thus mitigating the dominance of head categories in classification decisions.Experimental results show that our method can generate diverse and semantically rich tail samples in the case of a long-tail distribution of training sets,alleviating the scarcity of tail categories in long-tail datasets.Diff RC achieves significant performance improvements compared to the baseline model on multiple datasets for image classification tasks.· To address the problem that the feature space of emerging visual language representation frameworks(such as the CLIP model)is easily disrupted by long-tailed data,leading to a decrease in performance,we propose a novel visual language representation framework guided by category feature prototypes to alleviate the long-tail problem.This framework includes two stages: text-image modality matching training and image recognition finetuning.In the text-image modality matching training stage,we align the text and image features to the category feature prototypes that are evenly distributed in the feature space using a contrastive loss function guided by category feature prototypes,to prevent the long-tailed data from disrupting the feature space.In addition,we propose an irrelevant text filtering and attribute enhancement module that can make the model ignore irrelevant noisy text and pay more attention to critical attribute information.In the image recognition fine-tuning stage,the framework also addresses the problem of positive bias in the trainable classifier and designs a classifier structure based on category feature prototypes,which compensates for the performance of tail classes while ensuring the performance of head classes.The method we propose has achieved superior performance on multiple classic long-tail distributed datasets. |