Font Size: a A A

Cumulative Citation Recommendation For Online Knowledge Base

Posted on:2016-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J G WangFull Text:PDF
GTID:1108330503955327Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet, knowledge management and acquisition of human being is transferred from offline to online. Online Knowledge Bases(KBs), such as Wikipedia and Freebase, have become vital data sources of various web applications. These KBs are usually organized around entities such as persons, organizations, locations, and so on.Currently, the maintenance of a KB mainly relies on human editors. However, with the explosion of information, large-scale KBs are hard to be kept up-to-date solely by human editors. The less popular entities cannot be updated in time because they are not so spotlighted as popular entities. An outdated KB severely limits the effectiveness of applications depending on it. This gap could be bridged if relevant documents of KB entities can be automatically detected as soon as they emerge online and then be recommended to the editors with various levels of relevance. This is called the Cumulative Citation Recommendation(CCR).The contributions can be summarized as follows:First, the thesis introduces the background and related areas of CCR in detail. The mainstream approaches for CCR are broadly discussed, including unsupervised learning,semi-supervised learning and supervised learning. The pros and cons of these methods are presented.Second, the thesis focuses on supervised learning methods for CCR, including entitycentric query expansion, classification and learning to rank. This thesis also proposes the semantic and temporal features for supervised learning methods. The experiments on TRECKBA-2013 dataset evaluate the effectiveness of these novel features.Third, to address the data missing problem of less popular entities in CCR, a global discriminative model is achieved as a baseline approach via building a global classifier(ranker)with all training data regardless of the relationship among entities. While the global model cannot guarantee to achieve satisfactory performance for each entity. This thesis proposes an entity class-dependent discriminative mixture model by introducing a latent class layer to model the correlations between target entities and the latent classes. The model can better adjust to different types of entities and achieve better performance when dealing with a broad range of entities.Fourth, both the global model and entity class-dependent mixture model ignore the prior knowledge embedded in documents, hence the quality of recommended documents cannot be promised in CCR. A document class-dependent discriminative model is proposed via introducing a latent layer to capture the correlations between documents and their underlying classes. The model can better adjust to different types of documents and yield flexible performance when dealing with a broad range of documents. Experimental results prove that the document class-dependent mixture model can enhance the precision and accuracy of CCR.Fifth, the thesis studies the cold start CCR, in which target entities are selected from document streams instead of a reference KB. Since there is no KB profile to extract semantic features, the feature space becomes too sparse to build a satisfactory relevance model. To resolve the problem, the thesis proposes a event-based sentence clustering method and extracts sentence-level features for document ranking. These novel features are proven effective in cold start CCR.
Keywords/Search Tags:Knowledge Base Acceleration, Cumulative Citation Recommendation, Information Filtering, Mixture Model, Cold Start
PDF Full Text Request
Related items