Font Size: a A A

Optimization methods in data mining and machine learning

Posted on:1998-09-15Degree:Ph.DType:Dissertation
University:Rensselaer Polytechnic InstituteCandidate:Bredensteiner, ErinFull Text:PDF
GTID:1468390014978480Subject:Mathematics
Abstract/Summary:
Large quantities of data are generated in business, industry, and government. The purpose of this research is to develop tools to interpret this data using mathematical programming techniques for machine learning. The fundamental problem of machine learning considered is the discrimination between the elements of two or more sets in {dollar}Rsp{lcub}n{rcub}.{dollar} Each dimension of the space represents a feature or attribute of the elements of the set. In this research, we propose several methods of determining both linear and nonlinear discrimination functions using mathematical programming.; First, we investigate the problem of constructing a linear discriminant function from a geometric perspective. Under this framework we show the close relationship between existing approaches: two linear programming methods: the robust linear programming problem (RLP) and the multisurface method (MSM); and a quadratic generalized optimal plane program (GOP). A novel variation of RLP is proposed (RLP-P) combining the benefits of all three approaches. Theoretical and computational studies of the methods are performed.; Two novel methods for constructing linear discriminants are proposed using parametric bilinear programming. In the first method, a parametric error function is introduced that combines minimizing the number of points misclassified and the magnitude of the misclassified points. The second method is a feature selection problem that explicitly minimizes the number of features in each multivariate decision. We propose two versions: feature minimization with bounded accuracy and limited feature minimization. A Frank-Wolfe method is used to solve the bilinear subproblems. Computational results compare favorably with linear programming and other popular classification methods.; Finally, we propose a piecewise-nonlinear classification function for discriminating between more than two sets. This function is found by mapping the original attributes into a higher dimensional space and performing piecewise-linear discrimination in that space. This approach is advantageous since when the problem is mapped into a higher dimension feature space the number of variables and constraints in the dual problem does not grow. We propose a novel approach using the two class linear programming problem RLP-P on the multiclass data. Computationally, the RLP-P method performed superior to a prior linear programming method as well as quadratic programming problems.
Keywords/Search Tags:Data, Method, Linear programming, Problem, RLP-P, Machine
Related items