Font Size: a A A

Analyzing And Predicting User Behavior In Online Social Networks

Posted on:2014-08-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:C J XiaoFull Text:PDF
GTID:1268330401467824Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Online social networks have attracted several billion users, and extended the scopeof their social activities. The advent of the networks makes it possible to record humanactivities at a large scale, which provide a great opportunity to study human behavior.Analyzing online behavior of people is not only helpful to understand the traditionalsociological theories, but also vital to design practical applications, such as buildingrecommendation systems, improving effect of information propagation, and so on.In this dissertation, we analyze and predict the behavior of the users in onlinesocial networks: firstly analyze correlation between the gender, age and country (region)and selective behavior of disseminators and audience in three online social networks;secondly further predict the audience gender ratio based on analyses of users behaviorrelative to gender; and then study the relationship between user properties and behaviorand click number of their short URLs, and predict the click number; last extract userinterests from contents published by them and find users with similar interests. Themain contributions are as below:1. Examine the homophily theory in online social networks, and analyze the impactof medium on the width and speed of information propagation. Based on the large-scaledata from three online social networks YouTube, Flickr, and Twitter, we study thecorrelation between users’ properties and their selective behavior. And we find that interms of age, gender and location there are strong homophily characters in these threeonline social networks, i.e., audience always tend to select contents from disseminatorswith similar properties. By comparing these three networks, we also discover thatvideos of YouTube have the longest lifespan, while tweets of Twitter have the shortestone. For the global level of information propagation, Flickr is the twice Twitter, whileYouTube is almost the mean. In addition, dual-role users are ubiquitous in online socialnetworks, but most of such users are very active as either disseminators or audiences,but not both.2. To the best of our knowledge, this is the first to study the prediction of theaudience gender ratio in social networks. We predict the audience gender ratio before and after the publication of videos on YouTube. For the prediction before videopublication, we propose and examine two hypotheses: audience consistency and topicconsistency. The former means that videos made by the same authors tend to havesimilar male-to-female audience ratios, whereas the latter indicates that videos withsimilar topics tend to have similar audience gender ratios. The two features based onthese two hypotheses and other features are used in multiple linear regression (MLR)and support vector regression (SVR) models to conduct the prediction. The analysesshow that these two features are the key indicators of audience gender, whereas otherfeatures, such as gender of the user and duration of the video, have limited relationships.And the prediction achieves expected performance. For the prediction after videopublication, we use the early comments received within a short period after videos’publication to predict the ratio via simple linear regression (SLR). The experimentsindicate that this model can achieve better performance by using a few early comments.We also observe that the correlation between the number of early comments (cost) andthe predictive accuracy (gain) follows the law of diminishing marginal utility. We buildthe functions of these elements via curve fitting to find the appropriate number of earlycomments (approximately250) that can achieve maximum gain at minimum cost.3. This is the first to understand the key factors that affect web traffic from onlinesocial networks. The recent changes in the short URL policy of Twitter coupled withBitly APIs make it possible to accurately measure click number of each short URLspublished by each user in Twitter. By regarding accurate clicks as the standards of userinfluence, we analyze the correlation between users’ properties, behavior and contenttopics and accurate click number. Based on a large-scale measurement study, wedisprove the well-accepted wisdom about ways to attract web traffic from online socialnetworks. For example, one commonly accepted idea is that users should increasefollowers by reciprocal exchange of links and publish tweets including hashtags. Weshow that such an approach has limited effects and can actually degrade theclick-through rate. Instead, URLs in tweets including mentions can achieve higher rate.For published time, although audiences are equally receptive during daytime inweekends and weekdays, users fail to publish more URLs during weekends. For thesetopics, although users pay similar attention to Twitter and Facebook for each topic, theyreceive disproportionate click-through rates and retweet rates. And too narrow topics in their tweets tend to achieve less web traffic. Based on the above analyses, these featuresare extracted for predicting the level of user influence, and the predictive accuracy reach82%via Bagging model.4. Propose the method based on LDA to find users with similar interests, and thealgorithm to quickly search these users. For online social networks like Twitter whichlack key information describing user interests, we extract user interests using LatentDirichlet Allocation (LDA) and define the level of similarity between users to findsimilar users. To overcome the problem that searching the similar users is too expensivebecause of the large scale of online social networks, we firstly analyze characters ofsimilar users and find that majority of the similar users are from2and3hop far fromthe seed user, and in terms of similarity, follower relationship exhibit strong homophily,i.e., more similar users in the j-th hop followers of the seed user tend to have moresimilar followers (the j+1-th hop followers of the seed user). Based on these twocharacters, we propose the algorithm to quickly find users with similar interests. Finally,the experiments based the large scale data of Twitter clearly show the method of LDAoutperforms the baselines in finding similar users. And the searching algorithm cansignificantly decline the times of computation and achieve expected performance.
Keywords/Search Tags:online social networks, information propagation, user behavior, prediction, influence
PDF Full Text Request
Related items