Visualization for enhancing the data mining process

Data mining seeks to discover useful and novel patterns that may be hidden in large databases. This emerging field has triggered a dynamic industry that applies an heterogeneous variety of statistical and computational techniques to explore and analyze large and complex databases. In the data mining context, information visualization techniques have been widely applied to understand and to recognize visually patterns in large and complex datasets. In this dissertation work, after describing the data mining process, we investigate mainly the use of visualization to support some other tasks in this exploratory process.; First, we introduce an alternative model of the data mining process, in which visualization is incorporated as a medium to support the interactions between human users and entities (e.g., datasets, space of parameters, inductive models, patterns) present in this exploratory process. Second, we propose visualizations to support two tasks in the data mining cycle: (a) visualization to explore space of parameters generated by parameter tuning algorithms in the algorithm engineering phase; (b) visualization to explore models induced from training a data mining algorithm. For parameter selection, we experiment with the integration of visualization with algorithmic methods, that seek to automatically tune the parameters of data mining algorithms. For inductive models, we develop model visualizations for four data mining algorithms, and we show through several examples how these visualizations can be used to understand how inductive models transform data into patterns, and correlate data with decisions made by these models. Finally, we describe the algorithm selection problem in data mining, and introduce a guide to categorize data mining algorithms.
