An investigation of several document classification algorithms leading to the design of an autonomous software agent for locating specific, relevant information on the World Wide Web

Posted on:2002-12-29

Degree:Ph.D

Type:Thesis

University:California Institute of Technology

Candidate:Lindal, John

Full Text:PDF

GTID:2468390011498871

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

The goal of the research described in this thesis was to design an autonomous software agent that can locate specific, relevant information on the World Wide Web. The first chapter provides the motivation behind this project and a brief overview of the challenges associated with it. The next chapter presents the analysis which led to the development of a new, improved version of the computer program called ITRule. The improvements consist of a new algorithm for classifying documents that outperforms the previous one, significantly enhanced support for data exploration, i.e., the process of extracting information from raw data, and a new algorithm for quantizing numeric variables so they can be used by ITRule. The third part of this thesis compares the performances of three versions of ITRule, two versions of the Naive Bayes classifier, several neural networks, the decision tree algorithm called CART, and a linear support vector machine, in order to determine which one is best suited for selecting relevant web pages. An analysis of the test results shows that a new ITRule classification algorithm, based on cross validation combined with the J-measure, performs best. The fourth and final part of the thesis describes how some of these results were used in the design of a user friendly, autonomous software agent called Poirot that can help World Wide Web users stay up to date on new developments in topics of interest.

Keywords/Search Tags:

Autonomous software agent, World wide, Web, Algorithm, New, Relevant, Information

PDF Full Text Request

Related items

1	Students' success with World Wide Web search engines: Retrieving relevant results with respect to end-user relevance judgments
2	Incorporating quality metrics in agent-based centralized/distributed information retrieval on the World Wide Web
3	Creating a criterion-based information agent through data mining for automated identification of scholarly research on the World Wide Web
4	Research Of Small World Effect In World Wide Web
5	A World Wide Web server with embedded GIS
6	Information-seeking and the World Wide Web: A qualitative study of seventh-grade students' search behavior during an inquiry activity
7	An intelligent metasearch engine for the World Wide Web
8	Constructing virtual databases on the World-Wide Web
9	Adventure in cyberspace: Exploring the information content of the World Wide Web pages on the Internet
10	Impact of inference skills (abductive, inductive, deductive) on business information retrieval on the World-Wide Web