Font Size: a A A

Mining Topical Communities From Linked Corpus

Posted on:2013-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:G Q ZhengFull Text:PDF
GTID:2218330362459266Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Community discovery on large-scale linked document corpora has been a hotresearch topic for decades. There are two types of links. The first one, which wecall d2d-link, indicates connectiveness among different documents, such as blog refer-ences and research paper citations. The other one, which we call u2u-link, representsco-occurrences or simultaneous participations of different users in one document andtypically each document from u2u-link corpus has more than one user/author. Exam-ples of u2u-link data covers email archives and research paper co-authorship networks.Community discovery in d2d-link data has achieved much success, while methods forthat in u2u-link data either make no use of the textual content of the documents ormake oversimplified assumptions about the users and the textual content. In this pa-per we propose a general approach of community discovery for u2u-link data, i.e.,multiple user data, by placing topical variables on multiple authors'participations indocuments. Experiments on a research proceeding co-authorship corpus and a NewYork Times news corpus show the effectiveness of our model.
Keywords/Search Tags:Topics on Participations, Community discovery, Nonparametric statistical model, Hierarchical Dirichlet Process
PDF Full Text Request
Related items