Font Size: a A A

Learning from data streams using transductive inference and martingale

Posted on:2007-07-10Degree:Ph.DType:Dissertation
University:George Mason UniversityCandidate:Ho, Shen-ShyangFull Text:PDF
GTID:1458390005483664Subject:Computer Science
Abstract/Summary:
Streaming data, is ubiquitous, e.g., data generated from wireless sensor networks, web logs and click streams, ATM transactions, phone call records, and multimedia video data. One seeks to discover knowledge and to extract interesting patterns from the data stream. Such knowledge can be extremely useful for commercial, military, and home-land security purposes, among others. To ensure that what is learned is useful, one seeks to describe the data generating process using the best available model. This corresponds to model selection.; Two questions that affect the model selection problem in an online data streaming setting are explored: those of contents ("what") and time ("when"): (1) What are the data points needed to build a good predictive model? (2) When does the data generation model change? These questions correspond to the active learning problem and change detection problem, respectively. The active learning problem involves the selection of informative but yet unlabeled data points to label. The solution to this problem aims to label a small [minimum] number of data points to build a good model. The change detection problem involves the recognition of deviation from the existing data generation model. The solution to this problem aims to detect the change as fast as possible.; In this dissertation, an active learning strategy based on transductive inference and a change detection strategy using martingale are proposed. The contributions of this dissertation are as follows: (1) An active learning strategy based on transductive inference is proposed and justified. The active learning strategy is empirically shown to be feasible and compares favorably with other stream-based active learning strategies. (2) Change detection in data streams based on testing the exchangeability condition using martingale is proposed. The feasibility of the three proposed martingale tests for change detection is shown empirically on both labeled and unlabeled data points. The advantages of our novel one-pass incremental martingale change detection method are: it (i) does not require a sliding window on the data stream, (ii) does not require monitoring the explicit performance (e.g. classification error) as data points are streaming, and (iii) works well for high-dimensional data streams. The change detection method is used to implement (i) an online adaptive learning algorithm for labeled data streams, which compares favorably with a sliding window method; and (ii) a single-pass videoshot change detector for unlabeled video streams, which compares favorably to some standard off-line methods.
Keywords/Search Tags:Data, Streams, Transductive inference, Change, Compares favorably, Active learning, Martingale, Using
Related items