Font Size: a A A

Uncovering and Managing the Impact of Methodological Choices for the Computational Construction of Socio-Technical Networks from Texts

Posted on:2013-09-12Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Diesner, JanaFull Text:PDF
GTID:2455390008978185Subject:Computer Science
Abstract/Summary:
Socio-technical networks are ubiquitous and impact society on many dimensions. As individuals become socialized into those networks, they alternately internalize network behavior or transform network behavior through their participation. Frequently the functioning of networks involves communication within the network or processing of communication and information originating outside the network. Such communication and information data are often available as unstructured, natural language text data. Often in prior work, text data are analyzed separately from relational data, or are reduced to the fact and frequency of the flow of information between nodes. The latter approach acknowledges that information exchange has taken place, but disregards the content of the text data. However, we know that by not considering the substance of communication and information, we are limited in our ability to understand the effects of language use in networks, including the interplay and co-evolution of information and network structure and behavior. Thus, we expect that in bringing together text data and relational data, we will be able to make substantial advances in network analysis. A complicating factor is that sometimes the structure and behavior of networks are encoded in the text data itself. In these cases, network data needs to be extracted from text data. I propose to develop, apply and evaluate a set of computational methods that facilitate the joint analysis of relational data and the content of text data. In working towards this goal, I use an interdisciplinary and computationally rigorous approach that combines theory and models from social science and socio-linguistics with methods from natural language processing and machine learning that are based on probabilistic graphical models. The datasets used for this work are the Enron email data, data about research funding, and a dataset about the Sudan. The anticipated contributions include:;- Provide and evaluate methods that will be integrated into the publicly available software products AutoMap and ORA.;- Clean and normalize public datasets that contain relational data and text data in order to ensure that each node represents one unique social entity and no entity is represented by more than one node.;The overall goal with this thesis is to provide methods that support users in collecting rich network data that allow for meaningful and actionable analysis.
Keywords/Search Tags:Network, Data, Text, Methods
Related items