Font Size: a A A

Study On Related Entity Finding In Web

Posted on:2014-09-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:J S ZhangFull Text:PDF
GTID:1268330401471361Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and Information Retrieval Technology, Web becomes an important approach through which users obtain the information and search engine becomes an important tool to obtain information from Web. The traditional searching way is:when user submits a query to the search engine (e.g. Baidu, Google), the search engine will return a list of relevant documents to user. However, users often need relevant entity information instead of the list of documents. So, how to mine the entity information which users want from the web documents becomes a research focus. The research of REF(Related Entity Finding) is launched to address this issue. The defi-nition of REF is:Given an input entity, by its name and homepage, the type of the target entity, as well as the nature of their relation, find related entities that are of target type, standing in the required relation to the input entity.The returned entities should conform to the target type. However, the given type is often very coarse and will result in an inaccurate type judgment of entity. To address this issue, we did the following works:(1) Proposed an approach to automatically obtain the fine-grained target type through the syntactic parsing of query and its hyponym-seed entities using query tem-plates.(2) Proposed an induction-based approach utilizing a small amount of seed entities to obtain the discriminative rules of target type’s hyponym-category;(3) Proposed an approach based on features which is obtained by the optimal feature extraction method utilizing more seed entities to obtain the discriminative rules of target type’s hyponym-category.The initial entities retrieved by search engine are unordered. To meet users’ require-ments, all candidate entities must be ranked. To address this issue, we did the following works:(1)Proposed an approach of entity ranking based on generative probabilistic model. It ranks entities through the calculation of a triple-combination (entity relevancy, entity-type relevancy and entity-relation relevancy) and acquires the best combination method according to their comparisons. It utilizes two methods to calculate the entity-type rele-vancy which are based on different ways to acquire the hyponym-category discriminative rules of target type (one is based on induction and the other is based on feature extraction). Also we evaluated the effect of two kinds of smoothing method for entity ranking and proposed a method ("cut stop words to rebuild relation") to calculate the entity-relation relevancy which improved the ranking result and reduced the time expense.(2)Proposed an approach of entity ranking based on markov random fields. An entity is represented by three properties:a descriptive document, the entity type and the entity name. This method ranks entities through the linear combination which is according to the optimal weight parameters:the relevancy between query and entity’s descriptive document, the relevancy between target type and candidate-entity type, the relevancy between source-entity name and candidate-entity name.According to the definition of REF task, an entity is represented by a unique home-page. To find the homepages of the ranked entities, we proposed an approach based on a linear combination of two relevancy scores:the score of Web page’s multi-feature representation, the external link score of entity’s Wikipedia page.The experimental results demonstrate our proposed approaches can effectively achieve the REF task. It can significantly reduce the manual works of users and return a valid result to users.
Keywords/Search Tags:Related entity finding, Type refinement, Entity ranking, Homepage find-ing, Text feature extraction, Language model, Wikipedia
PDF Full Text Request
Related items