A tree-based summarization framework for differences between two data sets

Posted on:2010-11-21

Degree:M.S

Type:Thesis

University:Kent State University

Candidate:Wang, Dong

Full Text:PDF

GTID:2448390002484059

Subject:Computer Science

Abstract/Summary:

This work addresses the issue of describing the difference between two data sets. A framework is developed to quantify the difference between two data sets, given that the difference is induced by the different statistical distributions of the two data sets. Besides the quantification, this framework also provides an intuitive explanation of difference: a decision tree like structure is built to interpret the interesting point(s) of the difference. A dynamic programming algorithm is developed to give the global optimal solution. However, it has high computational complexity. To improve the efficiency, a greedy algorithm is proposed. Both algorithms are tested against the synthetic data sets and the real data sets.

Keywords/Search Tags:

Data sets, Framework

Related items

1	A framework of fuzzy variable precision rough sets and its applications
2	Research On Data Stream Classification Based On Granular Computing And F-Rough Sets Extension
3	Discovering functionally coherent gene sets using heterogeneous information sources: A graphical framework
4	Designing Of A Construction And Evolutionary Algorithm Of Self Sets Based On Rough Sets Theory
5	Fractures and their relation to other geological data sets along the southeastern shore of Seneca Lake, New York State: Implications for fault systems in the Appalachian Plateau
6	Study On Imbalanced Data Sets Classi-fication Method And Its Application In Telecommunication
7	Research On Classification Algorithms Of Data Mining Based On Imbalanced Data Sets
8	Applied spatial data structures for large data sets
9	A computational framework for performance characterization of three-dimensional reconstruction techniques from sequence of images
10	The Establishment And Application Of IPTAS Benchmark TCP Flow Data Sets