Creating a criterion-based information agent through data mining for automated identification of scholarly research on the World Wide Web

Posted on:2001-01-20

Degree:Ph.D

Type:Dissertation

University:University of North Texas

Candidate:Nicholson, Scott Richard

Full Text:PDF

GTID:1468390014954771

Subject:Mathematics

Abstract/Summary:

This dissertation creates an information agent that correctly identifies Web pages containing scholarly research approximately 96% of the time. It does this by analyzing the Web page with a set of criteria, and then uses a classification tree to arrive at a decision.;The criteria were gathered from the literature on selecting print and electronic materials for academic libraries. A Delphi study was done with an international panel of librarians to expand and refine the criteria until a list of 41 operationalizable criteria was agreed upon. A Perl program was then designed to analyze a Web page and determine a numerical value for each criterion.;A large collection of Web pages was gathered comprising 5,000 pages that contain the full work of scholarly research and 5,000 random pages, representative of user searches, that do not contain scholarly research. Datasets were built by running the Perl program on these Web pages. The datasets were split into model building and testing sets.;Data mining was then used to create different classification models. Four techniques were used: logistic regression, non-parametric discriminant analysis, classification trees, and neural networks. The models were created with the model datasets and then tested against the test dataset. Precision and recall were used to judge the effectiveness of each model. In addition, a set of pages that were difficult to classify because of their similarity to scholarly research was gathered and classified with the models.;The classification tree created the most effective classification model, with a precision of 96% and a recall of 95.6%. However, logistic regression created a model that was able to correctly classify more of the problematic pages.;This agent can be used to create a database of scholarly research published on the Web. In addition, the technique can be used to create a database of any type of structured electronic information.

Keywords/Search Tags:

Scholarly research, Web, Information, Agent, Create, Used

Related items

1	An Architecture For Multimodal Information Extraction From Scholarly Document
2	On The Mode Of Scholarly Communication Based On Network
3	Analysis Of Postgraduates' Scholarly Reading Behavior And Demand
4	Papyrophiles, electrocentrics and philistines: The slow growth of electronic scholarly journals
5	Research And Implementation Of Information Agent Based On The JADE Platform
6	Academic Venues Recommendation Based On Scholarly Information Network
7	Pub+Lab: Study And Experiment On A Scholarly Communication Mechanism Integrating Scholarly Publishing With Research Lifecycle
8	Research And Application On Big Scholarly Data-based Key Technique Of Academic Search System
9	Modeling scholarly communications across heterogeneous corpora
10	Application Of Information Agent Technology To CSCW