Font Size: a A A

An information theoretic approach to neural network desig

Posted on:1997-09-19Degree:Ph.DType:Dissertation
University:Stanford UniversityCandidate:Cunha, Fernando B. LFull Text:PDF
GTID:1468390014482253Subject:Electrical engineering
Abstract/Summary:
The traditional design of neural networks follows a two-step training procedure. In the first step, the initial weight values of a network are chosen at random. In the second step, these weight values are modified to make the network's outputs, in response to a set of input patterns, match as closely as possible a set of target output patterns. Despite much success, the traditional design has presented weaknesses revolving around the underlying problem of ill-conditioning, which seems to be intrinsic to neural network training problems.;This dissertation proposes a new network training procedure that adds an intermediate step to the traditional two-step training procedure. In this intermediate step, weight values are modified to condition the network to become maximally sensitive to the input patterns used to train the network. This step is implemented using a performance measure rooted in information theory, and is capable of minimizing ill-conditioning prior to the execution of the final step of training. This step is termed pre-conditioning. Theoretical and experimental analysis of pre-conditioning suggests that the procedure lessens problems with local optima and dramatically reduces network training time.;In addition, this dissertation introduces layered learning, which consists of the individualized training of each layer of a multilayer neural network, and is shown to have a remarkably positive effect on network training time. It also introduces two fundamental pseudo quantities: pseudodeterminant, the determinant of a rectangular matrix; and pseudoentropy, the amount of disorder on the output surface of an arbitrary mapping. Furthermore, it discusses some analytic properties of neural network layers and provides alternative proofs of generalized versions of the Pythagorean Theorem and the Triangle Inequality. Finally, this dissertation proposes a new non-parametric method of probability density function estimation that is based on maximum entropy arguments.
Keywords/Search Tags:Network, Training, Weight values
Related items