Novel Techniques for Improving Classification Systems by Incorporating Experts

Posted on:2014-02-17

Degree:Ph.D

Type:Dissertation

University:Polytechnic Institute of New York University

Candidate:Attenberg, Joshua M

Full Text:PDF

GTID:1458390005992573

Subject:Computer Science

Abstract/Summary:

This manuscript presents novel techniques for incorporating the domain knowledge and wisdom of human "oracles'' into the data mining workflow. Tasked with building predictive models for a real-world, web-scale prediction task, we quickly realized that many data mining techniques, including state-of-the-art research, fail to perform as advertised. Assumptions that could be made in the lab might not hold in reality. To overcome these difficulties, we would need to employ human effort in clever ways, overcoming unexpected deficiencies when collecting data for model training, performing predictions, or evaluating the quality of predictions a model may make.;Leveraging human knowledge for data mining or machine learning tasks is by no means anything new. Typically, constructing and monitoring a predictive machine learning system requires labeled example data. While some situations may elicit labels naturally, in others human effort must be employed to "manually'' examine each instance considered, applying an appropriate label based on observations. These labeled instances are most frequently used during or prior to the training phase of the data mining process, generating the data that is considered during model induction. Gathering labels for selected examples, however, is not the only way human effort can be employed to aid the efficacy of a data mining system. Humans can go out and seek examples they believe will be useful for a model's training. Additionally, labeled examples can be gathered for a model deployed in production, generating performance estimates, and building a better understanding of how a model behaves. Finally, examples can be labeled as substitutes for a model's imperfect label predictions, applying human expertise at inference time.;In the following research, we present several deficiencies in the existing techniques for gathering training data for data mining systems, offering alternative techniques that we demonstrate to be much more effective. We also show problems that exist in traditional model evaluation, problems that are particularly acute in web-scale predictive tasks. We provide an alternative approach that uses a game-ified design to aid the task of evaluating a model. Finally, we present a novel situation for applying human resources to predictive inference, giving a utility-optimizing approach, and demonstrating that our approach is, in fact, also a good way of gathering additional training data for model improvement.;The techniques presented herein are proven not only through simulation in the laboratory setting, but in reality---these ideas were forged from the demands of production. Being employed in a production system validated these ideas far beyond what is typical for machine learning research. Still, to demonstrate that the ideas discussed here can generalize to a variety of tasks, we go on to support our claims with a variety of simulations.

Keywords/Search Tags:

Techniques, Data mining, Novel, Human, System, Model

Related items

1	The Research On Data Mining System Architecture And Implementary Techniques For Remote Sensing Imagery
2	Hybrid Methods For Privacy Preserving Data Sharing Techniques On Data Mining Environments
3	Improve Human Performance of Perceptual-motor Tasks in Human-computer Interaction through Integration of Cognitive Modeling and Data Mining Techniques
4	Research On Core Algorithms And Techniques In Brain Data Mining For Chinese Cognition
5	Research Of Data-Mining-based Decision Support System And Its Key Techniques
6	Based On Statistical Data Mining Techniques Applied Research In The Crm System
7	PTA Process Online Monitoring And Optimization Based On Rigorous Model And Data Mining Techniques
8	Data Mining Techniques Applied Research In The Evaluation Of The Overall Quality Of College Students
9	Research Of Web Data Mining Techniques Based On XML
10	New data modeling techniques with applications to two data mining problems