Font Size: a A A

Probabilistic Graphical Modeling On Heterogeneous Social Networks And Applied To Information Retrieval

Posted on:2010-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2178360278962166Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Web, especially Web 2.0, social networks consisting of different types of information (i.e., heterogeneous types of Web objects and complex relationships between Web objects) become popular. It is highly infeasible to directly apply the conventional mining approaches to these complex heterogeneous social networks. This is because: 1) The conventional mining approaches usually describe Web objects in a lexical-level, not able to capture their latent semantics; 2) The different types of objects are often modeled separately in the traditional approaches, thus their dependent relationships are ignored; 3) Most of existing approaches take all relationships identically. However, different links are created with various intention, thus have different types and influences.This paper tries to break down the limitations in the existing approaches. Specifically, it focuses on how to model the heterogeneous social networks based on probabilistic graphical models, and how to apply the modeling results to Web object retrieval over heterogeneous networks.We first study how to retrieve heterogeneous Web objects in a semantic-level. Traditional methods for information retrieval are usually based on keyword matching (e.g., language model and vector space model). They estimate the relevance between query terms and support documents for Web objects in a lexical-level. However, when the description information for Web object is not enough, the keyword-matching methods may lead to ambiguity. We propose a retrieval method in a semantical-level based on probabilistic graphical model. The method uses hidden topics to describe query terms and objects respectively, thus matches queries and objects semantically. Experimental results show that the discovered hidden topics can help improve the search performance of heterogeneous objects effectively.We further study the dependent relationships between heterogeneous objects, and propose a unified Author Conference Topic model (ACT) to model heterogeneous networks. The model uses common hidden topics to describe query terms and heteroge- neous objects simultaneously. We then apply the modeling results to heterogeneous object retrieval and obtain improved performance. Moreover, we propose a heterogeneous PageRank algorithm, which assigns different weight to heterogeneous Web links. We combine heterogeneous PageRank with ACT model for object retrieval. Experimental results show that the algorithm can further improve the search performance of heterogeneous objects.Usually, users create Web links due to different intention, however, traditional methods ignore the difference between Web links, and take them identically. Distinguishing the types and influence of Web links is important for Web mining and analysis, especially for modifying traditional PageRank algorithm, also, it may help identifying spam links and tracing hot topics. This paper proposes Pairwise Restricted Boltzmann Machines to identify types and influence of Web links using hidden topic distribution . Compared with conventional methods, the model can obtain better performance.All the researches have been applied to Arnetminer, an academic research network mining system. We have applied the ACT model into the academic network, and provided the function of heterogeneous objects retrieval (paper, conference, and author), which can help users find their interested expertise objects conveniently; We have also applied the PRBM model into the academic network, and provided the function of citation graph retrieval, which can help researchers analyze the evolution of research areas.
Keywords/Search Tags:heterogeneous network, probabilistic graphical model, hidden topic, object retrieval
PDF Full Text Request
Related items