Font Size: a A A

Twitter Analytics: Geotag Imputation, Forecasting, and Dynamic Variable Selectio

Posted on:2019-09-21Degree:Ph.DType:Dissertation
University:North Carolina State UniversityCandidate:Bakerman, JordanFull Text:PDF
GTID:1478390017485162Subject:Statistics
Abstract/Summary:
The popularity of social media has created vast repositories of open source data with broad potential value. Researchers are actively mining these new complex data sources to create predictive models for wide-ranging applications. For example, Wikipedia is used to forecast influenza in the United States [Hickmann et al., 2015], Facebook is used for more effective advertising [Backstrom et al., 2010], and Twitter is used to forecast civil unrest in Latin America [Korkmaz et al., 2015]. In this dissertation, we create statistical methodology advancing the analytical value of Twitter.;We begin in Chapter 2 by developing a geotag imputation method to predict the origin of individual tweets. Standard practice uses either the content of the tweet, network information, or these two features independently to estimate the origin. We show improved accuracy by using both tweet text and user network information jointly. Moreover, we properly account for uncertainty, improving both precision and coverage of geotag imputation.;In Chapter 3 we focus on short term forecasting using daily word counts as model features scraped from Twitter. Conventional forecasting models in the area of social media are typically static, and therefore, researchers assume time invariant data. We consider a dynamic approach to account for possible time dependencies, which allows the forecasting model to evolve in time along with the data generating process. For the problem of civil unrest, we use dynamic logistic regression to forecast the probability of protest in Latin America and show improved accuracy compared to the static baseline model. Furthermore, we develop a dynamic variable selection technique based on penalized credible regions in order to contextualize the reasons for protest. The proposed methodology is scalable and outperforms the current baseline.;In Chapter 4, we combine the geotag imputation and dynamic model methodology of the previous chapters. This final project is a first step in using tweets with imputed geotags within geographic-specific forecast models. The goal is to understand the impact of measurement error due to the location uncertainty of tweets.
Keywords/Search Tags:Geotag imputation, Forecast, Dynamic, Twitter, Model, Data
Related items