Font Size: a A A

Modeling economic and financial behavior from large-scale datasets

Posted on:2015-09-23Degree:Ph.DType:Thesis
University:Indiana UniversityCandidate:Mao, HuinaFull Text:PDF
GTID:2478390020451869Subject:Information Science
Abstract/Summary:
Every facet of our daily life is being recorded by our communication infrastructure. Billions of individuals leave digital traces in social media, search engines, phone records, emails, GPS data, shopping records, and electronic sensors. These datasets are available on unprecedented large scale, with fine granularity, and in real time, which provide great resources for research and enhance our ability to study individual and collective behavioral patterns. This dissertation focuses on modeling economic and financial behavior from large-scale datasets recorded by social media, search engines, news media, and mobile phone services.;First, we study the dynamic properties of collective and individual mood states from large-scale social media content, in particular Twitter micro-blog updates, with an emphasis on longitudinal changes, their statistical properties, and their relations to a variety of socio-economic phenomena. As part of this research program, we develop a multi-dimensional mood analysis method, to track public mood from Twitter in six dimensions, namely, calm, alert, sure, vital, kind, and happy, which capture the rich structure of human mood in real-time. In support of this approach, we discovered that Twitter calmness has predictive information with respect to short-term stock market returns.;Second, to further explore the relations between collective mood states and market returns, we attempt to disentangle the relation between social mood and investor sentiment. We introduce a simple, direct, and unambiguous indicator of online investor sentiment by examining finance related keyword frequencies in Twitter updates and Google search queries. We found that Twitter and Google bullishness lead, i.e. positively correlate with, investor sentiment surveys. Also, we found that Twitter bullishness predicts stock return increases, followed by a reversal to the underlying fundamentals, in line with the Investor Sentiment Hypothesis in behavioral finance.;Third, we study how linguistic and social differences affect public mood states using a method that automatically converges on a semantic orientation lexicon from the analysis of large-scale news corpora. This work has applications in financial sentiment analysis in non-English languages, for which sentiment analysis resources can be scarce. Assuming that the stock market is the product of human collective judgments on the valence of news reports, we leverage market movement data to assess term valence in news report, i.e.~implicitly crowd-source news sentiment evaluation. This data is used to iteratively optimize a semantic orientation lexicon through an evolutionary process of semantic extension and pruning. We show that the obtained semantic orientation lexicon achieve high levels of accuracy in identifying the polarity of Chinese news articles.;Finally, we study how economic development patterns are manifested in large-scale mobile phone communication networks. In a case study of mobile phone records for a developing nation, Cote d'Ivoire, we show that mobile phone communication data can provide an accurate and detailed picture of economic development, even for low-income areas with poor information infrastructure. Our findings lay the ground work for future research that leverages mobile communication data to accurately analyze the current status of economic development within a nation at low cost, thereby allowing governments and aid organizations to respond swiftly and effectively to changing conditions.
Keywords/Search Tags:Large-scale, Data, Economic, Semantic orientation lexicon, Investor sentiment, Financial, Mobile phone, Communication
Related items