Font Size: a A A

Convexification and Deconvexification for Training Artificial Neural Networks

Posted on:2017-01-29Degree:Ph.DType:Dissertation
University:University of Maryland, Baltimore CountyCandidate:Gui, YichuanFull Text:PDF
GTID:1478390017451604Subject:Computer Science
Abstract/Summary:PDF Full Text Request
The purpose of this dissertation research is to overcome a fundamental problem in the theory and application of artificial neural networks (ANNs). The problem, called the local minimum problem in training ANNs, has plagued the ANN community since the middle of 1980s.;ANNs trained with backpropagation are extensively utilized to solve various tasks in artificial intelligence fields for decades. The computing power of ANNs is derived through its particularly distributed structure together with the capability to learn and to generalize. However, application and further development of ANNs have been impeded by the local minimum problem and attracted much attention for a very long time.;A primary difficulty of solving the local minimum problem lies in the intrinsic non-convexity of training criteria of ANNs, which usually contain a large number of non-global local minima in their weight spaces. Although an enormous amount of solutions have been developed to optimize the free parameters of the objective function for consistently achieving a better optimum, these methods or algorithms are unable to solve the local minimum problem essentially with the intricate presence of the non-convex function.;To alleviate the fundamental difficulty of the local minimum problem in training ANNs, this dissertation proposes a series of methodologies by applying convexification and deconvexification to avoid non-global local minima and achieve global or near-global minima with satisfactory optimization and generalization performances. These methodologies are developed based on a normalized risk-averting error (NRAE) criterion. The use of this criterion removes the practical difficulty of computational overflow and ill-initialization existed in a risk-averting error criterion, which was the predecessor of the NRAE criterion and has benefits to effectively handle non-global local minima by convexifying the non-convex error space. With employing a proper convexification and deconvexification strategy, it is also uncovered that the NRAE criterion has the advantage in handling high-dimensional non-convex optimization of deep neural networks, which typically suffer from difficulties such like local minima, saddle points, large flat regions, etc. existed in the non-convex error spaces.;In this dissertation, the effectiveness of proposed methods based on the NRAE criterion is first evaluated in training multilayer perceptrons (MLPs) for function approximation tasks, demonstrating the optimization advantage in avoiding or alleviating the local minimum problem compared to the training with the standard mean squared error criterion. Moreover, the NRAE-based training methods that are applied to train convolutional neural networks and deep MLPs for recognizing handwritten digits in the MNIST dataset present better optimization and generalization results than many benchmark performances, which were achieved by integrating different non-convex training criteria and deep learning approaches. At last, to enhance the generalization of the ANN trained by the NRAE-based method, a statistical pruning method that prunes redundant connections of ANN is implemented and experimented for further improving the generalization ability of the ANN trained by the NRAE criterion.
Keywords/Search Tags:NRAE criterion, Neural, Training, Local minimum problem, Artificial, Convexification and deconvexification, ANN, Non-global local minima
PDF Full Text Request
Related items