Font Size: a A A

A tree-based summarization framework for differences between two data sets

Posted on:2010-11-21Degree:M.SType:Thesis
University:Kent State UniversityCandidate:Wang, DongFull Text:PDF
GTID:2448390002484059Subject:Computer Science
Abstract/Summary:
This work addresses the issue of describing the difference between two data sets. A framework is developed to quantify the difference between two data sets, given that the difference is induced by the different statistical distributions of the two data sets. Besides the quantification, this framework also provides an intuitive explanation of difference: a decision tree like structure is built to interpret the interesting point(s) of the difference. A dynamic programming algorithm is developed to give the global optimal solution. However, it has high computational complexity. To improve the efficiency, a greedy algorithm is proposed. Both algorithms are tested against the synthetic data sets and the real data sets.
Keywords/Search Tags:Data sets, Framework
Related items