Font Size: a A A

Hierarchical models for relational data

Posted on:2010-07-10Degree:Ph.DType:Dissertation
University:Harvard UniversityCandidate:Thomas, Andrew ChristopherFull Text:PDF
GTID:1448390002974721Subject:Statistics
Abstract/Summary:
Scientific investigations of processes on networks (more generally, dyadic relational data) often assume that the data collected on each relation between individuals is without error that is, the representation given to the connection is without noise or random variation. Other investigations on networks seek to infer an underlying process of interest that generates connections between individuals. These observations motivate an investigation into three broader topics on this theme: generative models that produce network structures observable in natural, technological and social situations interpretations of topologies on networks beyond geodesic measures and the consequences of commonly observed data compression schemes on network tie strengths, namely tie value dichotomization.I begin by reviewing the past several decades of the development of generative network structures that allow for stochastic variation, then integrate many of them into the common framework of hierarchical Generalized Linear Models (GLMs) while adding other useful tools and interpretations from other areas of statistics, in particular data augmentation schemes to speed up Gibbs sampling and robust analysis methods. I then use these tools to analyze a map of the human brain and a network of associations between United States senators.I then examine the standard toolkit of network summary statistics, based largely on geodesic statistics, and propose an alternative set of measures based on Ohmic circuits, which allow for the inclusion of parallel pathways and are considerably more sensitive to small changes than their geodesic counterparts.Given this new toolkit, I use these statistics to examine three methods of (lossy) data compression in networked systems: "thresholding", in which the graph is dichotomized into (0,1) binary form about a fixed threshold value "name-one-friend", in which respondents are limited in the number of connections they may demonstrate, typically as a consequence of network design and deliberate outdegree censoring, which applies the previous method at the analysis stage as a possible alternative to thresholding. I show that even when compression seems to be a convenient strategy, its usefulness is outweighed by the introduction of bias and the loss of information.
Keywords/Search Tags:Data, Network, Models
Related items