Adaptively finding and combining first-order rules for large, skewed data sets

Posted on:2010-03-06

Degree:Ph.D

Type:Thesis

University:The University of Wisconsin - Madison

Candidate:Oliphant, Louis Tyrrell

Full Text:PDF

GTID:2448390002976821

Subject:Artificial Intelligence

Abstract/Summary:

Inductive Logic Programming (ILP) is a machine-learning approach that uses first-order logic to create human-readable rules from a database of information and a set of positive and negative examples. This thesis explores creating ensembles of rules to maximize area under the recall-precision curves (AURPC), a metric that focuses specifically on the coverage and accuracy of labeling the positive examples.;I create an ensemble of rules from a wide range of recall values and combine them to maximize AURPC. Gleaning rules that traditional ILP algorithms would discard and combining them into a single ensemble produces improved predictive performance while reducing the number of rules evaluated.;I evaluate several methods for finding sets of clauses that work well together. One method applies a probability distribution over the space of rules and selects rules more likely to improve Gleaner's performance. A second method follows a boosting framework and repeatedly re-weights examples in order to maximize AURPC. Merging the combining and search portions finds good candidate rules and shows improvement over the Gleaner algorithm.;I apply these first-order ensemble techniques to several data sets from two very different domains. The first data sets come from the Information-Extraction (lE) domain where the task is to find specific relationships in text. The next data sets come from the computer-assisted medical-diagnosis domain. The task is to identify findings on a mammogram as malignant or benign given descriptors of the findings, patient risk factors, radiologist's score, and information from any previous mammograms.;I also include my work with Davis et al.'s SAYU algorithm. I demonstrate methods to improve predictive performance and to increase understanding of malignancy indicators. I show that computer models trained on data from one institution are able to outperform radiologists at another institution even when no additional data are available from the new institution.

Keywords/Search Tags:

Data, Rules, First-order, Combining

Related items

1	Research On Diversity Combining Techniques In Wireless Communications
2	Association Rules Detecting Based On Attribute Topology
3	A general theory for evaluating joint data interaction when combining diverse data sources
4	Combining kernels for classification
5	The Research And Application Of Data Mining In Mining Rules Of Medical Diagnosis
6	Research On Data Mining Based Decision Rules And Association Rules
7	Research On Association Rules Mining Of Big Data
8	The Analysis And Application Of Data Integration For Order Response System
9	Study Of Application Of A Language Model Combining Statistics And Rules In Chinese Input Method
10	Research Of Data Mining Based On Relational Rules