| Although deep learning technology has made significant breakthroughs in many fields,it performs unsatisfying in the data-scarce scene where it is difficult to obtain a large number of samples or expensive to label.Driven by this practical problem,Few-Shot Learning(FSL)is proposed,which learns a stable model only from a few samples.Therefore,FSL is capable of alleviating the dependence and annotation pressure of deep learning on target category data and provides an effective scheme to overcome the data bottleneck of deep learning.Meanwhile,FSL transfers knowledge from source data to the target domain to reuse.However,FSL is still in its infancy stage,which suffers from its weak task generalization ability.Therefore,focusing on the task generalization prob-lem in few-shot image classification,this paper has carried out adequate studies from two aspects:scene generalization and domain generalization.We study the few-shot Human-Object Interaction(HOI)recognition problem in scene generalization and the Cross-Domain Few-Shot Learning(CD-FSL)and Domain-Adaptive Few-Shot Learn-ing(DA-FSL)problem in domain generalization.The main contributions are as follows:(1)For scene generalization,we propose a Semantic-guided Attentive Prototypes Network(SAPNet)framework to learn a semantic-guided metric space where HOI recog-nition can be performed by computing distances to attentive prototypes of each class.Specifically,the model generates attentive prototypes guided by the category names of actions and objects,which highlight the commonalities of images from the same class in HOI.In addition,we design two alternative prototypes calculation methods,i.e.,Proto-types Shift(PS)approach and Hallucinatory Graph Prototypes(HGP)approach,which explore to learn suitable category prototypes representations in HOI.Finally,in order to realize the task of few-shot HOI,we reorganize 2 HOI benchmark datasets with 2 split strategies,i.e.,HICO-NN,TUHOI-NN,HICO-NF,and TUHOI-NF.Extensive exper-imental results on these datasets have demonstrated the effectiveness of our proposed SAPNet approach.(2)For scene generalization,we propose Dynamic Graph-In-Graph Networks(DGIG-Net),a novel graph prototypes framework to learn a dynamic metric space by embedding a visual subgraph to a task-oriented cross-modal graph for few-shot HOI.Specifically,we first build a knowledge reconstruction graph to learn latent representations for HOI categories by reconstructing the relationship among visual features,which generates visual representations under the category distribution of every task.Then,a dynamic relation graph integrates both reconstructible visual nodes and dynamic task-oriented semantic information to explore a graph metric space for HOI class prototypes,which applies the discriminative information from the similarities among actions or objects.We validate DGIG-Net on multiple benchmark datasets,on which it largely outperforms existing few-shot learning approaches.(3)For domain generalization,we propose a task-expansion-decomposition frame-work for CD-FSL,called Self-Taught(ST)approach,which alleviates the problem of non-target guidance by constructing task-oriented metric spaces.Specifically,Weakly Supervised Object Localization(WSOL)and self-supervised technologies are employed to enrich task-oriented samples by exchanging and rotating the discriminative regions,which generates a more abundant task set.Then these tasks are decomposed into sev-eral tasks to finish the task of few-shot recognition and rotation classification.It helps to transfer the source knowledge onto the target tasks and focus on discriminative re-gions.We conduct extensive experiments under the cross-domain setting including 8target domains:CUB,Cars,Places,Plantae,Crop Dieases,Euro SAT,ISIC,and Chest X.Experimental results demonstrate that the proposed ST approach is applicable to various metric-based models,and provides promising improvements in CD-FSL.(4)For domain generalization,we propose a Dual distillation Discrimination Net-work(D~3Net)for DA-FSL.This method applies the idea of distillation discrimination to avoid the overfitting caused by the unequal number of samples in the target domain and the source domain.Meanwhile,we employ the task distributions on the source domain and the sample diversity of the source domain to enhance the target domain from the two levels of feature space and samples.The distribution alignment of source domain data and target domain data is realized through distillation discrimination,which not only maintains the distinction of each class of source/target but also realizes the whole fea-ture distribution alignment.Extensive experiments on mini-Image Net,tiered-Image Net and Domain Net demonstrate that our D~3Net is competitive in DA-FSL. |