Mining massive data streams

Posted on:2006-11-21

Degree:Ph.D

Type:Thesis

University:University of Washington

Candidate:Hulten, Geoffrey

Full Text:PDF

GTID:2458390005992251

Subject:Computer Science

Abstract/Summary:

Many organizations today have more than very large databases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. In this thesis we develop a method that can semi-automatically enhance a wide class of existing learning algorithms so that they can learn from such high-speed data streams in real time. In particular, our method can be applied to essentially any induction algorithm based on discrete search. After applying our method the algorithm: learns from data-streams in an incremental, any-time fashion; runs in time independent of the amount of data seen, while making decisions that are essentially identical to those that would be made from infinite data; uses a constant amount of RAM no matter how much data it sees; and adjusts its learned models in a very fine-grained manner as the data generating process changes over time. We evaluate our method by using it to produce a series of learning algorithms---for decision trees, Bayesian network structure, and clustering---which are all capable of learning from high-speed data streams. We evaluate these learners with extensive studies on synthetic data sets, and by applying them to a collection of massive real-world mining tasks.

Keywords/Search Tags:

Data streams, Mining

Related items

1	Research On Mining Algorithms Over Data Streams
2	Research And Implementation Of Frequent Pattern Mining Algorithms Over Data Streams
3	Research On Technique And Application Of Mining Data Streams
4	Mining Association Rules In Data Streams
5	The Research On The Related Problems Of Association Rule Mining Over Data Streams
6	The Research And Realization Of Clustering Algorithm In Data Streams Mining
7	Research On Classification Technologies In Mining Unsteady Data Streams
8	The Application And Research Of Incremental Clustering On Temporal Data Streams
9	Research On Frequent Pattern Mining Algorithms In Uncertain Data Streams
10	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams