Font Size: a A A

Entity Disambiguation And Its Application On Image Search

Posted on:2014-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:K Q ZhaoFull Text:PDF
GTID:2248330392960938Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Entity disambiguation is an signifcant problem of text understanding. Diferentfrom traditional word sense disambiguation problem, entity disambiguation try to dis-ambiguate phrases instead of words, and in most of the cases, the phrases are nominalphrases. Wikifcation is a kind of entity disambiguation methods which links phrasesto concepts of an external knowledge base, Wikipedia. Existing methods of wikifca-tion use bag-of-word model to represent the context of the phrase to be disambiguatedas well as the context of Wikipedia concepts, or use link structure of Wikipedia. Forthe bag-of-word model, phrases can have diferent meanings to the single words, sothat the model can’t catch the semantic of the text accurately. The link structure basedmethod also doesn’t work well since the links in Wikipedia are distributed sparsely. Inthis work, we present a framework for entity disambiguation based on links/conceptsco-occurrence in Wikipedia, and also a iterative method to enrich the links structureof Wikipedia. In addition, we use our entity disambiguation techniques to web im-age search. Web image search engine returns results with diferent entities mixturedtogether which lead to bad user experience. It is inconvenience for users to look forimages for a particular entity. Our goal is to group the search result into semantic clus-ters. The existing methods doesn’t work well in this task due to three reasons. First,techniques for visual recognition of objects are still immature; Second, modeling textcontext by bag-of-words is insufcient for the understanding of the context; Finally,there is a disconnect between visual cues and the actual semantics of the image. Weproposeaclusteringframeworkbasedonentitydisambiguation. Beneftsfromthehighaccuracy(over90%precision and recall on news articles) of our entity disambiguationapproach, the clustering framework outperforms the best competitor by20.8%in F1- measure and also by41.3%in NMI score. In addition, based on entity disambiguationprocess, our image clustering algorithm can generate a set of Wikipedia concepts todescribe each image cluster.
Keywords/Search Tags:Entitydisambiguation, Knowledgebase, Imagesearch
PDF Full Text Request
Related items