Font Size: a A A

Knowledge discovery from heterogeneous data streams using Fourier spectrum of decision trees

Posted on:2002-02-10Degree:Ph.DType:Dissertation
University:Washington State UniversityCandidate:Park, Byung-HoonFull Text:PDF
GTID:1468390011492734Subject:Computer Science
Abstract/Summary:
The possibility of extracting knowledge from continuous flow-in data has gained significant attention lately. These data environments—often called data streams—include satellite images, financial data, Web logs, network traffic data, etc. Mining from data streams presents both practical and theoretical challenges. It is crucial to process data immediately as it flows in. However, data often stack up too fast to handle. In addition, since only a fraction of the data block is available within a particular time period, we need an elegant approach which properly aggregates all partial knowledge mined at different time frames. This dissertation proposes an ensemble model-based approach to mining knowledge from data streams. It considers decision trees and Fourier spectrum analysis to efficiently aggregate trees in an ensemble. In particular, it points out that representation of a decision tree in the Fourier basis has several useful properties that can be used to manipulate the trees. It offers algorithms to compute the Fourier spectrum of a decision tree and shows that multiple decision trees can be combined by simply adding up their respective spectrum. It also describes a new and novel technique to visualize a complex ensemble model at various resolutions using Fourier spectrum.; Many time-critical applications like sensor networks and process control involve multiple data streams. So the ability to extract knowledge from multiple heterogeneous data streams is of paramount importance. Various heterogeneous sources are often scattered and data stack up fast Therefore, it is almost infeasible to apply any data mining algorithm that requires the centrally stored data. This dissertation attempts to mitigate the problem by employing a distributed data mining approach. It particularly builds a distributed decision tree model that can be learned with only modest amount of data exchange. This dissertation also describes basic requirements for an efficient, robust distributed data mining system and presents the BODHI as one such implementation.
Keywords/Search Tags:Data streams, Fourier spectrum, Decision trees, Distributed data mining
Related items