The development of social media has enabled the collection of behavioral data of unprecedented size and complexity. All social platforms have realized that great scientific and marketing values are contained in the millions of billions of behavioral records.Accurate prediction and detection of user behavior are key techniques for many social media applications, such as recommender systems, personalized search and social marketing.Behavioral analysis and modeling is the starting point of these techniques. It has been one of the most novel and important research problems in computer science. Researchers are facing a number of challenges, including high sparsity, heterogeneity and abnormality,brought by complex social media environments. Traditional behavioral models did not take complex characteristics or mechanisms of user behaviors into consideration, so they fail to provide effective prediction and detection. This thesis studies contextual, crossdomain/cross-platform and suspicious behavioral patterns, develops a series of novel data mining techniques, and provides behavioral models, behavioral prediction and detection methods. Main contributions are summarized as follows.1. Proposing information adopting behavioral models based on social contexts and spatial-temporal contexts. Social contextual model(Context MF) incorporates two factors, personal preference and interpersonal influence, to predict article sharing and message retweeting behaviors. Experiments demonstrate that this model performs much better than those models with one single factor. This thesis also proposes flexible multi-faceted evolutionary analysis(FEMA) for dynamic behavior prediction in spatial and temporal environments. Large-scale experiments show that this method can significantly improve prediction performance and speed-up incremental learning.2. Proposing transfer learning algorithms for cross-domain and cross-platform behaviors in social media. Social media users perform on multiple domains and multiple platforms to fulfill their information needs. To address high sparsity and cold start problems in a single domain or a single platform, this thesis proposes to utilize the social domain to bridge multiple domains in one platform and utilize overlapped users to bridge multiple platforms. It demonstrates that knowledge transfer from auxiliary domains and auxiliary platforms can significantly improve behav- ioral prediction performance in the target domain and target platform. Experiments on real data show that Hybrid RW and XPTrans algorithms provide break-through performance in cold-start users’ behavioral prediction.3. Proposing suspicious behavioral analysis and suspiciousness metric based on synchronicity and density. Fraudsters, spammers and zombie followers have threatened the peace and user experience in social media. This thesis captures synchronized and lockstep characteristics and proposes scalable, effective suspicious behavioral detection algorithms Catch Sync and Lock Infer. The algorithm catches frauds and spam, and recovers distorted degree distributions. It outperforms content-based methods and is complementary to them. Furthermore, the thesis proposes a novel metric based on probability theory to evaluate suspiciousness in multi-modal behavioral data. Cross Spot, the local search algorithm based on the metric can effectively catch information manipulating behaviors in large-scale real social media datasets.
|