Cost-sensitive information acquisition in structured domains

Posted on:2011-12-07

Degree:Ph.D

Type:Thesis

University:University of Maryland, College Park

Candidate:Bilgic, Mustafa

Full Text:PDF

GTID:2468390011471387

Subject:Artificial Intelligence

Abstract/Summary:

Many real-world prediction tasks require collecting information about the domain entities to achieve better predictive performance. Collecting the additional information is often a costly process (money, time, risk, etc.) that involves acquiring the features describing the entities and annotating the entities with target concepts and labels. For example, document collections need to be manually annotated for document classification and lab tests need to be ordered for medical diagnosis. Annotating the whole document collection and ordering all possible lab tests might be infeasible due to limited resources or may prove unnecessary. Thus, we need to be selective about which entity we annotate and which features we acquire. In this thesis, I explore effective and efficient ways of choosing the right information to acquire under limited resources. Specifically, I develop and empirically evaluate algorithms for feature and label acquisition in structured domains.;For the problem of feature acquisition, we are given entities with missing features and the task is to classify them with minimum misclassification cost. The likelihood of misclassification can be reduced by acquiring features but acquiring features incurs costs as well. The objective is to acquire the right set of features that balance acquisition cost and misclassification cost. Because finding the optimal solution is intractable in general, most previous approaches have been greedy. However, greedy approaches often get stuck in local minima and cannot naturally address the practical scenario where more than one feature needs to be acquired. I introduce a technique that can reduce the space of possible sets of features to consider for acquisition by exploiting the conditional independence properties in the underlying probability distribution.;For the problem of label acquisition, I consider two real-world scenarios. In the first one, we are given a previously trained model and a budget determining how many labels we can acquire, and the objective is to determine the right set of labels to acquire so that the accuracy on the remaining ones is maximized. In this setup, the entities appear in a network and acquiring the label of an entity helps us determine the correct labels of the other entities in the network. I describe a system that can automatically learn and predict on which entities the underlying classifier is likely to make mistakes and it suggests acquiring the labels of the entities that lie in a high density potentially-misclassified region. In the second scenario, we are given a network of entities that are unlabeled and our objective is to learn a classification model that will have the least future expected error by acquiring minimum number of labels. I describe an active learning technique that can exploit the relationships in the network both to select informative entities to label and to learn a collective classifier that utilizes the label correlations in the network.

Keywords/Search Tags:

Entities, Information, Acquisition, Network, Label, Cost

Related items

1	Research On The Automatic Acquisition Of Domain Entities
2	A Research On Acquisition Of Geographical Knowledge Between Geographical Entities Based On Semantic Grammar
3	Multi-Label Feature Selection And Its Label-Specific Acquisition Algorithm
4	An efficient and privacy-preserving framework for information dissemination among independent entities
5	Plasmonics Powered Hybrid Platform for Label Free Bio-Sensin
6	Research On Cost-sensitive Multi-label Classification Algorithms And Applications To Tag Recommendations
7	A tale of two paradigms: Disambiguating extracted entities with applications to a digital library and the Web
8	Research On Collection Method Of Drug Bottle Label Information
9	Research On Acquisition And Application Of Label Correlation In Multi-label Learning
10	Fast Multi-label Text Classification Algorithm Based On Cost Sensitive