Font Size: a A A

SAWTOOTH: Learning from huge amounts of data

Posted on:2005-01-25Degree:M.S.C.SType:Thesis
University:West Virginia UniversityCandidate:Orrego, Andres SebastianFull Text:PDF
GTID:2458390008492339Subject:Computer Science
Abstract/Summary:
Data scarcity has been a problem in data mining up until recent times. Now, in the era of the Internet and the tremendous advances in both, data storage devices and high-speed computing, databases are filling up at rates never imagined before. The machine learning problems of the past have been augmented by an increasingly important one, scalability. Extracting useful information from arbitrarily large data collections or data streams is now of special interest within the data mining community. In this research we find that mining from such large datasets may actually be quite simple. We address the scalability issues of previous widely-used batch learning algorithms and discretization techniques used to handle continuous values within the data. Then, we describe an incremental algorithm that addresses the scalability problem of Bayesian classifiers, and propose a Bayesian-compatible on-line discretization technique that handles continuous values; both with a "simplicity first" approach and very low memory (RAM) requirements.
Keywords/Search Tags:Data
Related items