Font Size: a A A

Derivative-free Optimization Method Of Convolution-based Models For Few-shot Learning

Posted on:2022-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2518306323978269Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Convolutional Neural Network(CNN),as a feedforward neural network,has demonstrated to be highly successful in data-intensive applications,and achieved excellent performance in image processing.However,it may be hampered when the data set is small,mostly because of its dependence on the gradient descent and the convolution structures requiring sufficient data size.Since the empirical risk minimizer is no longer reliable in few-shot learning scenarios,recent CNN-based deep learning methods based on gradient descent,network re-structure,or data augmentation,have achieved limited performance improvements.In addition,the convolutional structure and the pooling structure will bring some information loss.Tackling this problem in the object classification field of image processing,with AlexNet as examples,proposed a derivative-free optimization method of convolutional neural network based pre-training models for few-shot learning.First,each sample is generalized into a series of samples based on causal intervention and data augmentation technology,and the non-sequential data is converted into sequence data.At the same time,the size of the sample dataset is expanded.Based on the optimal transport theory and co-integration test,evaluated the feature extraction ability of the sampling points in the model from the perspective of the stability of data distribution,.and then guided the directional network pruning work of the pre-training model,removing part of the noise information extracted by the pre-training model,made the model a better descriptor of the data distribution.Then,based on the capital asset pricing model and optimal transport theory,perform forward learning without gradient propagation in the intermediate output process of the pre-training model,construct a new structure and optimize it,and perform selective recombination of sampling points based on the collaborative relationship between sampling points.generates a representation vector with clear distinction between classes in the distribution space,which gets rid of the dependence of gradient descent algorithm.Finally,the intermediate output of the network would further adaptively generate the effective features and form the embedding representation vector through a self-attention module.Experimental results show that the proposed method can effectively improve the performance of CNN-based pre-trained models such as AlexNet and ResNet in few-shot learning scenarios.Its accuracy on the animal data set(about 100 categories)in ImageNet 2012 is increased from 58.85%,78.51%to 68.50%,86.75%,respectively.
Keywords/Search Tags:Capital Asset Pricing Model(CAPM), wasserstein distance, derivative-free learning, self-attention, pre-trained model
PDF Full Text Request
Related items