Font Size: a A A

Use and analysis of new optimization techniques for decision theory and data mining

Posted on:2011-12-15Degree:Ph.DType:Dissertation
University:University of California, BerkeleyCandidate:Moreno Centeno, ErickFull Text:PDF
GTID:1448390002956117Subject:Engineering
Abstract/Summary:
This dissertation addresses important problems in decision theory and data mining. In particular, we focus on problems of the form: Each of several information sources provides evaluations or measurements of the objects in a universal set, and the objective is to aggregate these, possibly conflicting, evaluations into a consensus evaluation of each object in the universal set. In addition, we concentrate on the scenario where each source provides evaluations of only a strict subset of the objects; that is, each source provides an incomplete evaluation..;In order to define the consensus evaluation from a given set of incomplete evaluations, two distances are developed: the first is a distance between incomplete rankings (ordinal evaluations) and the second is a distance between incomplete ratings (cardinal evaluations). These two distances generalize Kemeny and Snell’s distance between complete rankings and Cook and Kress’ distance between complete ratings, respectively. Specifically, we introduce a set of natural axioms that must be satisfied by a distance between two incomplete rankings (ratings) and prove the uniqueness and existence of a distance satisfying such axioms. Given a set of incomplete rankings (ratings), the consensus ranking (rating) is defined as the complete ranking (rating) that minimizes the sum of distances to each of the given rankings (ratings). We provide several examples that show that the consensus ranking (rating) obtained by this approach is more intuitive than that obtained by other approaches.;Finding the consensus ranking is NP-hard, thus we develop two optimization methodologies to find the consensus ranking: one efficient approximation algorithm based on the separation-deviation model and one exact algorithm based on the implicit hitting set approach. In addition, we show that the optimization problem that needs to be solved in order to find the consensus rating is a special case of the separation-deviation model (hereafter SD model), which is solvable in polynomial time. In this sense, the herein developed theory (described in the previous paragraph) can be thought of an axiomatization of the SD model.;Three applications of the SD model are presented: rating the credit-risk of countries; customer segmentation; and ranking the participants in a student paper competition. In the credit-risk rating study, it is shown that the SD model leads to an improved aggregate rating with respect to several criteria. We compare the SD model with other aggregation methods and show the following: Although the SD model is a method to aggregate cardinal evaluations, the aggregate credit-risk ratings obtained by the SD model are also good with respect to “ordinal criteria”. Several properties of the SD model are proven, including the property that the aggregate rating obtained by the SD model agrees with the majority of agencies or reviewers, regardless of the scale used. The customer segmentation study shows how to use the SD model to process data on customer purchasing timing. The outcome of the SD model provides insights on the rate of new product adoption by the company’s consumers. In particular, the SD model is used as follows: given the purchase dates for each customer of several products, this information is aggregated in order to rate the customers with regard to their promptness to adopt new technology. We show that this approach outperforms unidimensional scaling—a widely used data mining methodology. We analyze the results with respect to various dimensions of the customer base and report on the generated insights. The last presented application illustrates our aggregation methods in the context of the 2007 MSOM’s student paper competition. The aggregation problem in this competition poses two challenges. First, each paper was reviewed only by a very small fraction of the judges; thus the aggregate evaluation is highly sensitive to the subjective scales chosen by the judges. Second, the judges provided both cardinal and ordinal evaluations (ratings and rankings) of the papers they reviewed. This chapter develops the first known methodology to simultaneously aggregate ordinal and cardinal evaluations into a consensus evaluation. Although the content of this dissertation is framed in terms of decision theory, Hochbaum showed that data mining problems can be viewed as special cases of decision theory problems. In particular, the customer segmentation study is a classic data mining problem.
Keywords/Search Tags:Data mining, Decision theory, SD model, Customer segmentation, Problem, Particular, Evaluations, New
Related items