Font Size: a A A

The Upstart algorithm for pattern recognition in continuous, multiclass domains

Posted on:2002-05-17Degree:Ph.DType:Dissertation
University:University of Louisiana at LafayetteCandidate:Fanguy, Ronnie AnthonyFull Text:PDF
GTID:1468390011998476Subject:Computer Science
Abstract/Summary:
Upstart is an algorithm designed to build a classifier for two-class problems in which the data sets contain only binary data. Upstart functions by building a network of linear threshold units, in which the errors occurring at a node are corrected by adding up to two new nodes that serve as consultants to the erroneous node. Although Upstart's initial design focuses on two-class problems with binary data sets, a few modifications manifest Upstart's potential to solve more general pattern recognition problems. We present extensions to Upstart that enable it to be used for problem domains in which the data sets contain continuous attributes and for problem domains requiring multiclass prediction.; To accomplish this, we explore and address issues that arise when applying Upstart to linear threshold units in two-class domains, including: (1) a tendency to produce duplicate or near duplicate nodes, (2) the need for appropriate stopping criteria, and (3) a propensity toward imbalanced training sets.; We propose extensions to Upstart which deal with each of these issues. The primary extensions which provide the most improvement to Upstart include: (1) restricting the training sets of consultant nodes, (2) using geometric mean as the measure of accuracy when training, and (3) preventing the production of duplicate nodes.; Additionally, we examine the usefulness of Upstart networks composed of classifiers other than linear threshold units for two-class domains. We experiment with using decision trees, single attribute tests, nearest neighbor classifiers, and a stacked combination of a decision tree and a number of nearest neighbor classifiers. Except for decision trees, all of these node types perform well when used with Upstart.; Finally, we extend Upstart to multiclass domains by examining extensions which enable each node of an Upstart network to predict more than two class labels as well as the consequences associated with doing this. A version of each alternative node type successfully used in two-class domains is adapted to the multiclass version of Upstart. After introducing these extensions, we compare our solution to an alternative solution, MUpstart.
Keywords/Search Tags:Upstart, Multiclass, Domains, Data sets, Linear threshold units, Extensions, Two-class
Related items