SAWTOOTH: Learning from huge amounts of data

Posted on:2005-01-25

Degree:M.S.C.S

Type:Thesis

University:West Virginia University

Candidate:Orrego, Andres Sebastian

Full Text:PDF

GTID:2458390008492339

Subject:Computer Science

Abstract/Summary:

Data scarcity has been a problem in data mining up until recent times. Now, in the era of the Internet and the tremendous advances in both, data storage devices and high-speed computing, databases are filling up at rates never imagined before. The machine learning problems of the past have been augmented by an increasingly important one, scalability. Extracting useful information from arbitrarily large data collections or data streams is now of special interest within the data mining community. In this research we find that mining from such large datasets may actually be quite simple. We address the scalability issues of previous widely-used batch learning algorithms and discretization techniques used to handle continuous values within the data. Then, we describe an incremental algorithm that addresses the scalability problem of Bayesian classifiers, and propose a Bayesian-compatible on-line discretization technique that handles continuous values; both with a "simplicity first" approach and very low memory (RAM) requirements.

Keywords/Search Tags:

Data

Related items

1	Seismic Achievement Data ETL Platform Architecture Design And Software System Implementation
2	The Research And Application Of Data Preprocessing In XML Data Warehouse
3	Research On Related Issues Of Unstructured Data
4	The Data Integration、analysis And Utilization For Hosiptal Information Based On The Data Warehouse
5	Design And Implementation Of Data Mining Support Subsystem Based On Big Data Of Power
6	Design And Implementation Of Environmental Monitoring Data Management System
7	Research On The Problems And Countermeasures Of Domestic Data Journalism Practice
8	Study On Data Dependency_Based Data Quality Processing Techniques In Data Integration
9	Big Data And Research Of Big Data In Modern Internet Applications
10	Design And Implementation Of The Bayonet Data Integration Platform