This work addresses the issue of describing the difference between two data sets. A framework is developed to quantify the difference between two data sets, given that the difference is induced by the different statistical distributions of the two data sets. Besides the quantification, this framework also provides an intuitive explanation of difference: a decision tree like structure is built to interpret the interesting point(s) of the difference. A dynamic programming algorithm is developed to give the global optimal solution. However, it has high computational complexity. To improve the efficiency, a greedy algorithm is proposed. Both algorithms are tested against the synthetic data sets and the real data sets. |