Topic models for link prediction in document networks

Posted on:2013-03-18

Degree:Ph.D

Type:Thesis

University:The Pennsylvania State University

Candidate:Kataria, Saurabh

Full Text:PDF

GTID:2458390008468803

Subject:Information Technology

Abstract/Summary:

Recent explosive growth of interconnected document collections such as citation networks, network of web pages, content generated by crowd-sourcing in collaborative environments, etc., has posed several challenging problems for data mining and machine learning community. One central problem in the domain of document networks is that of link prediction among any two documents or document centric entities, such as authors, based upon already present links in a given network. The problem of link prediction in document networks is a fundamental problem. Several applications, such as recovering missing link among entities in a given network of documents, citation recommendation to research professionals, collaborator recommendations to authors, discovering influential authors or bloggers in research articles or web-logs respectively, studying ideas and opinion propagation in evolving collection of research documents or news media, disambiguating references of people mentioned in news articles, etc. can be cast as a particular flavour of link prediction problem to be solved. This thesis studies following three link prediction based research problems in document networks: (i) Who influences other's actions in a collaborative research environment?, (ii)which documents get cited by a document that joins a citation network?, and (iii)which is the correct entity for an entity mention in free text?.;Among various computation methods to solve domain specific link prediction problem, statistical machine learning based techniques are an increasingly acceptable method due to their capability of modeling complex relationships among documents and document centric entities and dedicated efforts from research community to make the resulting intractable inference computationally scalable. This thesis proposes two types of statistical models: (1) models that mimic the generation process of document networks e.g. citation network of scientific documents, interconnected blog articles, web pages, etc.; (2) models that are capable of incorporating a specific task oriented features as supervision. The proposed statistical models are an extension of Latent Dirichlet Allocation, also known as topic models. In this work, I show how topic models can be adapted for the above mentioned link prediction problems. The proposed techniques perform superior to previous approaches for these link prediction problems.

Keywords/Search Tags:

Link prediction, Document, Networks, Topic models, Problem, Citation

Related items

1	Topic models and dynamic prediction models and their applications in document retrieval and healthcare
2	Research On Methods Of Link Prediction In Social Networks
3	Research On Ranking Topic Models And Their Applications
4	Literature Topic Extracting Based On Weighted Semantic And Citation Relation
5	Research On Topic-based Dynamic Link Prediction Method
6	The Research Of Probabilistic Topic Models And Their Application In Relational Text Classification
7	Research Of Link Prediction And Routing Strategy In Opportunistic Network
8	Interdisciplinary Citation Network Link Prediction
9	Interdisciplinary Topic Identification Based On Citation Relation And Citation Content
10	Citation Behavior Analysis Of Chinese Documents