Font Size: a A A

Visual Analytics in High-Dimensional Data with Dichotomous Outcom

Posted on:2018-09-24Degree:Ph.DType:Dissertation
University:The University of North Carolina at CharlotteCandidate:Zhang, ChongFull Text:PDF
GTID:1478390020456282Subject:Computer Science
Abstract/Summary:
High-dimensional data becomes common in application areas such as environmental studies and healthcare. The high dimensionality presents opportunities for understanding how certain outcomes happen by identifying significant variables contributing to the outcomes. Many efforts have been made to address this task. However, automated data analysis techniques often suffer from the "curse of dimensionality" and the difficulty of result interpretations. To integrate human intelligence into the analysis process and facilitate information communication with users, high-dimensional data visualization techniques have been developed. Unfortunately, high-dimensional data often leads to a cluttered visual display that obscures pattern discovery and hinders understanding of the data. Whereas a few visual analytics approaches have been developed to bridge automated data analysis and interactive visualization for high-dimensional data, few existing works have been focused on finding explanatory relationships between variables and outcomes.;In this dissertation, we address the task with two distinct paths from high-dimensional data with dichotomous outcomes to knowledge. First, we use visualizations to facilitate logit model building. We propose two approaches. In the first approach, Parallel Coordinates is used to facilitate dimension reduction based on correlation analysis, the first step of logit model building. It addresses the difficulties of correlation comparison and exploration when there are hierarchical outcome variables. In the second approach, a visual analytics pipeline is proposed for logit modeling. It leverages the traditional modeling pipeline by providing (1) intuitive visualizations for inspecting statistical indicators and the relationships among the variables and (2) a seamless, effective dimension reduction pipeline for selecting variables for inclusion in high quality logistic regression models.;Second, we enhance visualizations with automated data analysis. In particular, association rule mining is employed to enhance Parallel Sets for categorical data exploration. Dimension reduction and reordering are conducted to reduce clutters and facilitate visual explorations in Parallel Sets based on significant association rules. The effectiveness and efficiency of our approaches are illustrated by a set of case studies and experiments with benchmark datasets.
Keywords/Search Tags:Data, Visual analytics
Related items