Font Size: a A A

Predicting latent demographic attributes of Twitter users

Posted on:2017-12-07Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Frolov, GeorgiyFull Text:PDF
GTID:2468390014966492Subject:Computer Science
Abstract/Summary:
Social media websites such as Twitter, Facebook, and LinkedIn aggregate large amounts of textual data. There is a wealth of user information that can be inferred from this, that is potentially useful in advertising, analytics, sentiment analysis, etc. It is estimated that over 60% of people in the US have a Twitter account, and a significant portion of US population is comprised of immigrants. As social media have become common place, people are willingly posting their personal information such as their name, age, location, alma mater, etc.;This makes it possible to use text classification methods to accurately determine demographic profiles. This thesis focuses on extracting latent demographic information from social media data. Previous works have attempted to determine user's race and ethnicity, while our work focuses on using posts on Twitter (tweets), to determine whether a user is an immigrant or a native US citizen. The method uses ethnic name distribution among immigrant and native populations to find and collect users in the United States, and their tweets across three race groups: Asian, Latino, and Caucasian/White. We use supervised machine learning approach to predict the immigration status of a user by examining the textual content of tweets, using Multinomial Naive Bayes, Support Vector Machines, Logistic Regression, k-Nearest Neighbors, and Decision Trees. We investigate methods for improving the performance of algorithms and determine how number of features affects the accuracy of the built models. Additionally we evaluate which features have more weight in classifying users, and attempt to discover latent topical patterns in the data corpus using Latent Dirichlet Allocation.
Keywords/Search Tags:Latent, Twitter, User, Data, Demographic
Related items