Font Size: a A A

Online Knowledge Discovery With Streaming Features

Posted on:2014-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:K YuFull Text:PDF
GTID:1268330398979586Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Compared to the traditional online knowledge discovery with a static feature space, online knowledge discovery with a dynamic feature space has not attracted much attention. A feature space is dynamic when not all features are available before learning begins or when the feature space changes dynamically over time. Therefore, a dynamic feature space might make the feature space of traning data become high dimensional and uncertain, which is challenging for traditional online knowledge discovery algorithms.In order to explore online knowledge discovery with a dynamic feature space, we define the concept of streaming features to model high yet dynamic feature dimensions without the necessity of a whole feature space before learning starts. With streaming features, the features flow in one by one and each feature is online processed upon its arrival while the number of instances is fixed. With streaming features, in this dissertation, we study online knowledge discovery with a dynamic feature space and our main contributions are as follows.(1) We propose a new online feature selection framework for applications with streaming features where the knowledge of the full feature space is unknown in advance. With this framework, we present a novel Online Streaming Feature Selection (OSFS) method to select strongly relevant and non-redundant features on the fly. An efficient Fast-OSFS algorithm is proposed to improve feature selection performance. Experimental results demonstrate that the algorithms achieve better compactness and higher prediction accuracy than existing streaming feature selection algorithms.(2) We study a new research problem of discovery of local causal relationships in the context of streaming features. With a causal Bayesian network to represent causal relationships, we propose a novel algorithm, called CDFSF (Causal Discovery From Streaming Features) to discover local causal relationships from streaming features. In order to improve the efficiency of CDFSF, using the symmetry properties between parents (causes) and children (effects) in a faithful Bayesian network, we present a variant of CDFSF, S-CDFSF (Symmetrical CDFSF). Experimental results validate our algorithms in comparison with the existing algorithms for causal relationship discovery.(3) Mining emerging patterns (EP for short) is a challenging issue in the context of streaming features. To address this challenging problem, we propose two EP miners for mining emerging patterns from a high yet static feature space, called CE-EP and MB-EP, where CE stands for direct Causes and direct Effects, and MB for Markov Blanket. To mine EPs from a high yet dynamic feature space, we present a novel streaming pattern mining technique, called EPSF (mining Emerging Patterns with Streaming Feature selection). Compared to CE-EP and MB-EP, EPSF can mine EPs from not only a high yet static feature space, but also a high yet dynamic feature space. Extensive experiments on a broad range of datasets show the effectiveness of the CE-EP MB-EP, and EPSF classifiers against other well-established methods, in terms of predictive accuracy, pattern numbers, running time, and sensitivity analysis.(4) Also, we apply our proposed methods, including OSFS, Fast-OSFS, CE-EP, and EPSF, to a case study on automatic impact crater detection in real planetary images. Extensive studies reveal the advantages of our methods over existing streaming feature selection algorithms, crater detection methods, and well-known feature selection algorithms. Meanwhile, this case study validates our proposed methods on real-world data.
Keywords/Search Tags:Streaming features, Feature selection, Local causal discovery, Emerging patterns, Crater detection
PDF Full Text Request
Related items