Data science for imbalanced data: Methods and applications

Posted on:2017-12-01

Degree:Ph.D

Type:Dissertation

University:University of Notre Dame

Candidate:Johnson, Reid A

Full Text:PDF

GTID:1468390011999826

Subject:Computer Science

Abstract/Summary:

Data science is a broad, interdisciplinary field concerned with the extraction of knowledge or insights from data, with the classification of data as a core, fundamental task. One of the most persistent challenges faced when performing classification is the class imbalance problem. Class imbalance refers to when the frequency with which each class appears in data is not roughly equivalent, and the problem is that introduction of class imbalance into the task of classification poses serious challenges that must be addressed in order to provide knowledge and insight. Yet, the challenges that class imbalance gives rise to are in part due to its ubiquitous prevalence, and it stands as a problem that pervades almost every area of investigation under which data science has provenance. In this dissertation, we investigate and counter class imbalance in several domains, touching upon areas as diverse as ecological informatics, scientific impact prediction, healthcare analytics, and education. By the end of this dissertation, we will develop and apply a variety of methods and techniques used to combat the class imbalance that is endemic to these and other domains. We will also compare and contrast the approaches we use to combat class imbalance with more traditional approaches typically employed within each domain. Before beginning, however, we present a preliminary overview of the class imbalance problem and the algorithms, methods, and evaluation metrics generally employed to address it.

Keywords/Search Tags:

Imbalance, Class, Data, Science, Methods, Problem

Related items

1	A balanced approach to the multi-class imbalance problem
2	Combating the class imbalance problem in small sample data sets
3	Research On Data Imbalance In Visual Tracking
4	Relationships Between Evaluation Criteria Of Feature Selection And Analysis On Class Imbalance Problem Over Vhr Remote Sensing Imagery
5	A Data Augmentation Method For Image Class Imbalance Problem Using Generative Adversarial Networks
6	Imbalanced Data Learning Based On Kernel Methods
7	Studying Class Imbalance Characteristics And Classification Methods On Internet Traffic Flows
8	Research On Multi-class Imbalance Learning
9	The Research Of Class Imbalance Classification Model In Data Mining
10	Research On Contrast Pattern-based Classification For Imbalanced Data