Font Size: a A A

Research On Link-baded Classification In Social Networks

Posted on:2013-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y WanFull Text:PDF
GTID:1118330371978044Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Social networks are a ubiquitous paradigm of human interactions in real world. As a burgeoning research area, social network data mining has attracted the attention of scientists in many fields in recent years. Especially the rapid development of online social networks (e.g., Facebook, Twitter, weibo.com, renren.com) has greatly stimulated the study of social networks for the purposes of social service, marketing, and public security.Classification is one of the most important problems in social network data mining. Node classification, relationship labeling, and link prediction in social networks all have wide ranges of application prospects in many areas. The existence of complex autocorrelation dependencies among instances in social networks, i.e., these instances are not independently and identically distributed (IID), makes it unsuitable to use traditional classification methods to classify them. Link-based classification came out to answer the question. It manages to capture the autocorrelation information among network instances, and then by collectively inferring values for the entire set of instances simultaneously, it can achieve greatly improved classification results in social networks.Statistical relational learning (SRL), which is based on probabilistic graphical models, has some inherent advantages for classification in social networks. Probabilistic graphical models are a marriage between probability theory and graph theory. They provide a solid mathematical foundation for uncertainty reasoning. SRL models represent the autocorrelation of relational data (i.e., network data) in certain relational language and are very effective for link-based classification. However, there are still some existing problems for SRL to perform the classification in social networks. First, it is very difficult to precisely capture the autocorrelation information among instances in networks. Second, the current SRL models commonly suffer from the problem of high computational complexity. In this dissertation, we present our work of in-depth studies on these two problems and a research on our applying of link-based classification to a specific field.The main contents and contributions of this dissertation are as follows:(1) We propose a two-stage learning framework for relational Markov networks (RMN) to improve the efficiency of its parameter estimation. As a representative SRL model, RMN uses structural query language to represent autocorrelation and lets user construct the network structure of the model by defining relational clique templates. Thus RMN just needs to learn the parameters rather than the structure. This structural simplicity makes it very easy to be used in practice, but the inefficient parameter estimation hinders its application in large-scale social networks. To improve the efficiency, we propose a two-stage learning framework for learning RMN, in which we divide all the cliques into two categories, i.e., evidence cliques and compatibility cliques. At the first stage the parameters of evidence cliques are estimated in a flat setting. Then at the second stage all the parameters are estimated based on the result of the first stage. Experimental results show that this two-stage learning framework can obviously improve the efficiency of RMN learning.(2) We propose a Community-based Relational Markov Networks (C-RMN) model for node classification in social networks. Community structure is one of the most important properties of social networks. According to the notion of "birds of a feather flock together", we introduce the community information into the definition of relational clique templates, which improves their capability of representing the autocorrelation dependencies. In addition, we propose a Discriminative Maximum Pseudolikelihood Estimation (DMPLE) approach to estimate the parameters of C-RMN. Experimental results on some real-world network datasets show that, compared with the RMN mode, C-RMN obviously improves the classification accuracy, and the DMPLE approach greatly reduces the training time with a minor loss of the prediction accuracy.(3) We propose a community-based relationship labeling approach. Community information is also very critical for the task of relationship labeling in social networks. We use it to construct community-based conditional random field (CRF) relationship labeling model:we first detect the community structure of a social network by using some community detection algorithms, and then construct a CRF of relationship labels based on the detected community information, and finally use the pseudolikelihood technique to estimate the parameters. Experimental results on two real-world social networks show that, the community-based relationship labeling approach outperforms the traditional classifiers as well as the RMN model.(4) To apply the link-based classification to the field of mobile communication, we propose a typed community detection framework based on relationship labeling. Mobile social networks consist of mobile users who communicate with each other using cell phones. Discovering typed communities (e.g., family communities or corporate communities) in mobile social networks is a very promising problem. For example, it can help mobile operators to determine the target users for precision marketing or services. We propose discovering typed communities in mobile social networks by utilizing the labels of relationships between users. We first classify the relationships between connected users by using link-based classification, and then take the relationship labels (with a form of probability) as the weight of relationships, and finally employ some weighted community detection algorithms to discover the typed community structure. The experimental results on real-world mobile social networks show that our proposed framework performs well in terms of the accuracy of typed community detection in mobile social networks.
Keywords/Search Tags:Social Networks, Link-based Classification, Node Classification, Relationship Labeling, Conditional Random Fields, Relational Markov Networks, Community Structure
PDF Full Text Request
Related items