New techniques in intelligent information filtering

Posted on:2004-07-17

Degree:Ph.D

Type:Thesis

University:Rutgers The State University of New Jersey - New Brunswick

Candidate:Macskassy, Sofus Attila

Full Text:PDF

GTID:2468390011470281

Subject:Computer Science

Abstract/Summary:

Intelligent Information Filtering is the process of receiving or monitoring large amounts of dynamically generated information and extracting the subset of information that would be of interest to a user based on some specified information need. Historically, this need has been based on user profiles that are directly evaluable—the information can be immediately classified as interesting or not. In this thesis I introduce a new type of user interestingness criterion which is prospective—the criterion defines the interestingness of an information item based on events that happen subsequent to the information item appearing. Hence, the interestingness cannot be directly evaluated. A new technique is described which takes such a criterion and operationalizes it, using machine learning to generate a predictive model that can directly evaluate a piece of information. I show that this technique works statistically significantly better than the baseline of predicting based on class distribution on five information filtering case studies.; However, a predictive model is only as good as the trust that its user puts in it. Many predictive models are opaque, in the sense that they are not easily understood or explained to a human. Thus, I introduce a technique for taking an opaque model and generating a small set of rules that attempt to replicate its performance. I show that the rules generated by my technique on the five case studies are plausible representations of the predictive model and help explain how it works.; Finally, since many information filtering tasks involve primarily text, I have developed a new technique for enabling the use of numerical features in text classification—a technology often used to generate predictive models for information filtering. This technique converts a number into a bag of tokens such that numbers close to each other have high overlap in the tokens and numbers far apart do not. I show that this approach improves performance significantly over using only text and, further, that this approach is competitive to state-of-the-art numerical classifiers such as C4.5 and Ripper on pure numerical classification problems that do not even involve text.

Keywords/Search Tags:

Information, Technique, New, Text

Related items

1	Research On Text Based Information Hiding Technique
2	Text Information Hiding Technique By The Carrier Of Song Poetry
3	New techniques in intelligent information filtering
4	The Key Techniques Research On Text Mining
5	Research Of Short Text Summary Generation Based On Text Structure Information
6	The Research And Implementation Of Text Classification Based On Meta-information And Optimization
7	The Research And Implementation Of Text Classification Based On Meta-Information And Optimization
8	Design Of Information Hiding Algorithms Based On Text
9	Research On The Key Techniques Of Web Information Intelligent Acquisition
10	Researching Text Classification Using Semantic And Sequence Information