Font Size: a A A

Visual Cluster Analysis of Temporal Sequences

Posted on:2015-03-31Degree:Ph.DType:Dissertation
University:University of California, DavisCandidate:Wei, JishangFull Text:PDF
GTID:1478390020952119Subject:Computer Science
Abstract/Summary:
Clustering is the task of dividing data into groups of similar data objects. How to cluster a data set depends on an analyst's intended use of the data. In a typical cluster analysis scenario, the analyst clusters the data, evaluates the result, and re-runs the algorithm until she obtains a satisfying result. Throughout the process, the analyst is in control and guides the clustering.;Visual cluster analysis investigates how to design automatic clustering algorithms coupled with usable interaction and visualization techniques to enable analysts to drive the clustering process. This dissertation specifically focuses on visual cluster analysis of temporal sequences. Although some particular techniques have been developed for temporal sequence data, the general framework and system design principles are applicable to other data types.;Humans and data are two major factors that present challenges to visual cluster analysis research. On one hand, humans are central to cluster analysis. Developing intuitive interaction and illuminating visualization to allow analysts to direct and optimize the clustering process is not a trivial task. On the other hand, we are generating data in large volumes with unprecedented complexity. It is challenging to design scalable visualization and clustering algorithms given the sheer data size and complexity.;This dissertation addresses the above challenges by introducing novel interaction, visualization, and clustering approaches. The dissertation contributions lie in the design and application of visual cluster analysis for studying large-scale temporal sequence data. First, a sketch-based interface is designed for classifying trajectories, which are temporal numerical sequences. Second, an interactive clustering algorithm is tailored to incorporate humans' guidance in cluster analysis. Third, a parallel model-based clustering algorithm is devised to handle large temporal data using multiple CPUs and GPUs. Fourth, a scalable visualization approach is developed to summarize and present massive clickstreams, which are temporal categorical sequences. Fifth, a K centroid chains algorithm is adapted to support user-guided cluster analysis.;This dissertation also presents a visual cluster analysis framework. Under this framework, three systems were created for analyzing different types of temporal sequences. These systems have been deployed to analyze real-world data, such as the temporal correlation curves from a turbulent combustion simulation and clickstreams on the eBay website. By using these systems, analysts can more easily confirm known patterns, gain new insights, and make sense of phenomena by correlating multiple data facets. The successful applications of these systems demonstrate the power of visual cluster analysis methods to exploit Big Data.
Keywords/Search Tags:Cluster, Data, Temporal, Sequences, Systems
Related items