Font Size: a A A

Research On The Co-Optimization Of Network Connections And Parameters

Posted on:2022-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X D WangFull Text:PDF
GTID:1488306779982479Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,we are experiencing a new wave of artificial intelligence development with the development of deep learning techniques.Deep learning requires deep network models that rely on gradient backpropagation algorithms to fit the training data and generalize to new data.This process requires a lot of computing resources,and is made possible by the emergence and development of parallel computing devices.Meanwhile,a large number of excellent training algorithms are proposed to refresh the performance of various deep learning scenarios tasks,especially in computer vision and natural language processing tasks.In order to pursue better model performance,researchers keep increasing the size of neural networks and designing more complex networks.However,deeper networks not only require more computing resources,but also make the training more difficult.Moreover,large-scale networks are difficult to be deployed in the field due to the limited hardware resources of various existing mobile and vehicle terminals.Recently,researchers have found that there are a large number of weight connections with very small amplitude in the trained network models,and removing these connections does not have an impact on the model performance.This has led researchers to focus on how to continuously suppress network redundant connections during the training of large-scale networks,allowing the network connections to be optimized in concert with the parameters.The main reasons why large-scale networks are difficult to train reside in large parameter search space and difficult hyperparameter selection.The co-optimization of network connections and parameters can,on the one hand,optimize the parameter search space by suppressing the expression of redundant network connections,which can improve the performance of large-scale networks;on the other hand,sparsify the network by directly pruning the suppressed network connections,which can significantly reduce the number of parameters and computation of the model,and facilitate the deployment of the model to various edge-end devices.Therefore,this thesis conducts an in-depth study on the co-optimization of network connections and parameters to address the problems in(a)sparse network training(b)network regularization training(c)overfitting due to network-data scale mismatch,from various aspects such as training algorithm,loss function regularization term,and hyperparameter self-learning and implemented into various scenario tasks(e.g.,image classification tasks,image retrieval tasks,etc.):(1)In order to address the problem that the sparse network training method based on projection operator is prone to prune important network connections,this thesis proposes a sparse network training method based on gradual projection operator.Our method gradually suppresses the gradient of the network connections to be pruned by a family of gradual projection operators,avoids a one-size-fits-all approach to directly delete the network connections to be pruned,and gives some important connections that have mistakenly entered the pruning queue a chance to be reactivated,which improves the performance of the sparse network.In task of image classification,the sparse models obtained by the proposed sparse network training algorithm outperform the state-of-the-art baselines,especially in extreme sparsity levels.(2)For the existing search task of winning tickets(i.e.,sparse sub-networks that can reach or even surpass their corresponding dense networks when trained from scratch),there are problems that require multi-round search and the found winning tickets do not perform well at high sparsity.This thesis proposes a sparse reparameterization method based on layer-wise continuous sparsification of the network.In this thesis,we first analyze and observe the baseline method(continuous sparsification based winning tickets search method)and find that(a)the multi-round search mechanism is effective in improving the performance of the found winning tickets(b)the learning for the distribution of the network connection importance parameter(i.e.,gate parameter)has a significant impact on the performance of the found winning tickets.Therefore,this thesis performs a layer-wise search for the winning tickets by compressing the single-round search cycle from one training round to one training epoch to perform more search rounds.In task of image classification,the winning tickets found by our method outperform state-of-the-art baselines,especially in extreme sparsity levels.(3)To address the insufficient generalization of large-scale networks training,this thesis proposes a layer-wise continuous sparsification regularization technique.Unlike the common parameter norm regularization techniques,the proposed regularization does not directly use the norm of the weight parameter as the penalty term,but measures the importance of each network connection by the gate parameter,and matches the corresponding gain and gradient flow for each connection to achieve the regularization purpose.Extensive experiments show that the layer-wise continuous sparsification regularization technique can further improve the accuracy of the dense network and enhance the generalization of the model.(4)To address the problem that large networks trained on small data(e.g.,fine-grained image datasets)are prone to over-fitting,this thesis proposes a hard dynamic global softmin loss function.This objective loss function can be formulated as a family of temporary loss functions by introducing a learnable parameter to accommodate the learning of hard samples.In task of fine-grained image retrieval,the proposed method achieves better performance than the state-of-the-art on several fine-grained image benchmark datasets.
Keywords/Search Tags:Deep learning, sparse network training, regularization training, network connection, dynamic objective functions
PDF Full Text Request
Related items