Font Size: a A A

Research On Encyclopedic Knowledge Bases Oriented Entity-document Relevance Classification

Posted on:2019-05-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:L R MaFull Text:PDF
GTID:1488306470493514Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Encyclopedic knowledge bases(such as Wikipedia,Baidu Bai Ke)are of great significance for curating and applying encyclopedic knowledge.Encyclopedic knowledge bases not only serve as the key platform for people to search for knowledges in our daily life,but also provide the source of knowledge for various applications such as intelligence search engines,knowledge graphs,question answering systems,entity retrieval etc..It is reported that there is a median lag of one year bewteen the publication date of a news article and the date that the news article is edited into a Wikipedia profile.How to keep the contents of Knowledge Bases timely becomes very important for their uses.In order to alleviate the serious lag of the contents of knowledge bases,International Text Retrieval Conference(TREC)launched Knowledge Base Acceleration-Cumulative Citation Recommendation(KBA-CCR)in 2012,in which many renowned universities and research institutions took part.The key task of the KBA-CCR is the relevance analysis between entities and documents,and has become a hot research topic in knowledge base acceleration.According to different relevances between documents and entities,there have two main approaches to cope with CCR tasks including classification and learning to rank.These approaches achieved higher or competitive performnace gains in some aspects with various features designed by domain experts and powerful machine leaning models.However,many problems need to be explored.In this thesis,we consider the relevance analysis between entities and documents as a classification task,named as Encyclopedic knowledge bases oriented entities and docments relevance classification tasks.It aims to find the relevant documents for target entities from the big text data stream,and to classify entity-document according to their relevance.The contributions of our works are summarized as follows:(1)A document representation model based on target entity burst features is proposed.Some works have shown that the temproal feature plays an important role in the recommendation task of the kownledge bases.In this thesis,a document representation model based on target entity burst feature is proposed,which takes into account not only the entity burst features,but also the semantic features between entity and document.Experimental results show that the proposed document representation model based on entity burst features can significantly improve the entity-document relevance classification performances.(2)An entity-document class-dependent discriminative mixture model is presented.In fact,the entity and document relevance classification task should consider the entity and the document together.Therefore,when an entity's class information is consistent with a document's class information,the document is more likely to become a reference to the target entity.In this thesis,we propose an entity-document class-dependent discriminative mixture model,and consider the prior class information of an entity and a document together,then use a hybrid model combining the prior class information with the semantic information between the entity and the document.According to experiments,we can find that the entitydocument class-dependent discriminative mixture model can not only handle the diversity of entities and documents flexibly,but also cope with entities and documents that are not in the training set,and have a strong generalization ability.(3)The classification model incorporating preference information is proposed.With regard to the number and diversity of entities and documents,the labelled data is very limited.Although the annotation data consumes a lot of manpower,and financial resources,it is of great for the research task.To fully exploit the valid information of the annotated data,the thesis presents a preference enhanced support vector machine model.The model not only considers the differences between different categories of samples,but also preference information between the same category of samples,and builds a novel SVM with the preference information.The experimental results show that the support vector machine model incorporating preference information can effectively improve the classification performance gains.Moreover,the model can be used to other applications with general ability.(4)An entity-document combined deep learning network classification model is presented.Previous works focused on how to design entity-document features and select suitable models.The hand-crafted features require domain experts to take a lot of efforts,and can not generalize to other applications.In this thesis,the model automatically learns the latent features of entities and documents,and classifies an entity-document in an end-to-end fashion.The resutls of experiments show that the proposed model can effectively improve the entity-document relevance classification performance,and provides a new way to cope with the entity-document relevance classification task.
Keywords/Search Tags:Knowledge base cumulative citation recommendation, Entity-citation classification, Entity burst feature, Classification learning
PDF Full Text Request
Related items