Font Size: a A A

Optimal Function Approximation Using Re Lu Neural Networks

Posted on:2022-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2480306764494694Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has achieved state-of-the-art performance in a range of fields such as computer vision,recommender system,natural language processing,and so on.Despite their promising results in applications,there are some important problems in the neural networks theory.The expressive power of neural networks,as an important part of neural networks theory,which plays a vital role for understanding neural networks.In the view of function approximation,the expressive power describes neural networks’ ability to approximate arbitrary functions.According to the universal approximation theorem,the width of the single hidden layer network is wide enough,the objective function can be approximated with arbitrary accuracy.Neural networks used are getting bigger and bigger in order to achieve superior accuracies in industries.This raises the fundamental questions concerning the expressive power of neural networks: What is the accuracy limit one can achieve using a network of given size?How fast does the accuracy increase with network size? Describing in mathematical language,what is the minimal approximation error a network can achieve? How fast does the approximation error decrease with network size? On the other hand,current training techniques e.g.stochastic gradient descent are considered whether it is able to fully expolite the expressive power of neural networks.If not,how big is the gap between network training error and the minimal approximation error?In view of the above problems,the mainly works are as follows:1.We introduce the necessary and sufficient conditions for the optimal approximation of convex functions with a piecewise linear(PWL)function of segment number n.According to the conditions,the upper and lower bounds of the optimal approximation error and the optimal approximation rate are obtained.Because of the nonlinear of neural networks,the structure of Re LU neural network with fixed depth and fixed width are presented to generate the optimal approximation linear segments.The upper bounds of the optimal approximation error with the network structure are explained.2.According to the optimal function approximation theory,we propose an algorithm to compute the optimal approximations and explained its convergence.We conduct experiments to validate its effectiveness and compare with a classic optimal approximation algorithm.We also demonstrate that the theoretical limit of approximation errors is not attained by Re LU networks trained with stochastic gradient descent optimization,which indicates that the expressive power of Re LU networks has not been exploited to its full potential.3.Dividing linear regions is proposed to ensure all the samples divided correctly.The method is used to compute average fitting error of each linear region in the neural networks,which can explain the difference of the expressive power of different network structures.4.For high-dimensional functions,its approximation error is calculated through experiments.The results on different network structures with the same number of neurons demonstrate that deep networks possess stronger function approximation ability than shallow networks.
Keywords/Search Tags:deep learning theory, ReLU networks, expressive power, optimal approximation, linear region
PDF Full Text Request
Related items