Font Size: a A A

New techniques in intelligent information filtering

Posted on:2004-07-17Degree:Ph.DType:Thesis
University:Rutgers The State University of New Jersey - New BrunswickCandidate:Macskassy, Sofus AttilaFull Text:PDF
GTID:2468390011470281Subject:Computer Science
Abstract/Summary:
Intelligent Information Filtering is the process of receiving or monitoring large amounts of dynamically generated information and extracting the subset of information that would be of interest to a user based on some specified information need. Historically, this need has been based on user profiles that are directly evaluable—the information can be immediately classified as interesting or not. In this thesis I introduce a new type of user interestingness criterion which is prospective—the criterion defines the interestingness of an information item based on events that happen subsequent to the information item appearing. Hence, the interestingness cannot be directly evaluated. A new technique is described which takes such a criterion and operationalizes it, using machine learning to generate a predictive model that can directly evaluate a piece of information. I show that this technique works statistically significantly better than the baseline of predicting based on class distribution on five information filtering case studies.; However, a predictive model is only as good as the trust that its user puts in it. Many predictive models are opaque, in the sense that they are not easily understood or explained to a human. Thus, I introduce a technique for taking an opaque model and generating a small set of rules that attempt to replicate its performance. I show that the rules generated by my technique on the five case studies are plausible representations of the predictive model and help explain how it works.; Finally, since many information filtering tasks involve primarily text, I have developed a new technique for enabling the use of numerical features in text classification—a technology often used to generate predictive models for information filtering. This technique converts a number into a bag of tokens such that numbers close to each other have high overlap in the tokens and numbers far apart do not. I show that this approach improves performance significantly over using only text and, further, that this approach is competitive to state-of-the-art numerical classifiers such as C4.5 and Ripper on pure numerical classification problems that do not even involve text.
Keywords/Search Tags:Information, Technique, New, Text
Related items