Font Size: a A A

Interactive visualizations of natural language

Posted on:2011-05-09Degree:Ph.DType:Dissertation
University:University of Toronto (Canada)Candidate:Collins, Christopher MervinFull Text:PDF
GTID:1468390011970381Subject:Information Science
Abstract/Summary:
While linguistic skill is a hallmark of humanity, the increasing volume of linguistic data each of us faces is causing individual and societal problems --- 'information overload' is a commonly discussed condition. Tasks such as finding the most appropriate information online, understanding the contents of a personal email repository, and translating documents from another language are now commonplace. These tasks need not cause stress and feelings of overload: the human intellectual capacity is not the problem. Rather, the computational interfaces to linguistic data are problematic --- there exists a Linguistic Visualization Divide in the current state-of-the-art. Through five design studies, this dissertation combines sophisticated natural language processing algorithms with information visualization techniques grounded in evidence of human visuospatial capabilities.;Two design studies explore the space of content analysis. DocuBurst is an interactive visualization of document content, which spatially organizes words using an expert-created ontology. Broadening from single documents to document collections, Parallel Tag Clouds combine keyword extraction and coordinated visualizations to provide comparative overviews across subsets of a faceted text corpus.;Finally, two studies address visualization for natural language processing research. The Bubble Sets visualization draws secondary set relations around arbitrary collections of items, such as a linguistic parse tree. From this design study we propose a theory of spatial rights to consider when assigning visual encodings to data. Expanding considerations of spatial rights, we present a formalism to organize the variety of approaches to coordinated and linked visualization, and introduce VisLink, a new method to relate and explore multiple 2d visualizations in 3d space. Inter-visualization connections allow for cross-visualization queries and support high level comparison between visualizations.;From the design studies we distill challenges common to visualizing language data, including maintaining legibility, supporting detailed reading, addressing data scale challenges, and managing problems arising from semantic ambiguity.;The first design study, Uncertainty Lattices, augments real-time computer-mediated communication, such as cross-language instant messaging chat and automatic speech recognition. By providing explicit indications of algorithmic confidence, the visualization enables informed decisions about the quality of computational outputs.
Keywords/Search Tags:Visualization, Language, Linguistic, Data, Natural
Related items