Study On Related Entity Finding In Web

Posted on:2014-09-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J S Zhang

Full Text:PDF

GTID:1268330401471361

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and Information Retrieval Technology, Web becomes an important approach through which users obtain the information and search engine becomes an important tool to obtain information from Web. The traditional searching way is:when user submits a query to the search engine (e.g. Baidu, Google), the search engine will return a list of relevant documents to user. However, users often need relevant entity information instead of the list of documents. So, how to mine the entity information which users want from the web documents becomes a research focus. The research of REF(Related Entity Finding) is launched to address this issue. The defi-nition of REF is:Given an input entity, by its name and homepage, the type of the target entity, as well as the nature of their relation, find related entities that are of target type, standing in the required relation to the input entity.The returned entities should conform to the target type. However, the given type is often very coarse and will result in an inaccurate type judgment of entity. To address this issue, we did the following works:(1) Proposed an approach to automatically obtain the fine-grained target type through the syntactic parsing of query and its hyponym-seed entities using query tem-plates.(2) Proposed an induction-based approach utilizing a small amount of seed entities to obtain the discriminative rules of target type’s hyponym-category;(3) Proposed an approach based on features which is obtained by the optimal feature extraction method utilizing more seed entities to obtain the discriminative rules of target type’s hyponym-category.The initial entities retrieved by search engine are unordered. To meet users’ require-ments, all candidate entities must be ranked. To address this issue, we did the following works:(1)Proposed an approach of entity ranking based on generative probabilistic model. It ranks entities through the calculation of a triple-combination (entity relevancy, entity-type relevancy and entity-relation relevancy) and acquires the best combination method according to their comparisons. It utilizes two methods to calculate the entity-type rele-vancy which are based on different ways to acquire the hyponym-category discriminative rules of target type (one is based on induction and the other is based on feature extraction). Also we evaluated the effect of two kinds of smoothing method for entity ranking and proposed a method ("cut stop words to rebuild relation") to calculate the entity-relation relevancy which improved the ranking result and reduced the time expense.(2)Proposed an approach of entity ranking based on markov random fields. An entity is represented by three properties:a descriptive document, the entity type and the entity name. This method ranks entities through the linear combination which is according to the optimal weight parameters:the relevancy between query and entity’s descriptive document, the relevancy between target type and candidate-entity type, the relevancy between source-entity name and candidate-entity name.According to the definition of REF task, an entity is represented by a unique home-page. To find the homepages of the ranked entities, we proposed an approach based on a linear combination of two relevancy scores:the score of Web page’s multi-feature representation, the external link score of entity’s Wikipedia page.The experimental results demonstrate our proposed approaches can effectively achieve the REF task. It can significantly reduce the manual works of users and return a valid result to users.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Theory And Key Techniques Of Entity Retrieval
2	English Entity Answer Extraction And Home Find
3	Research Of Related Entity Extraction And Homepages Finding
4	Research On The Approach Of Entity Type Inference In Wikipedia
5	Relatedentityfinding And Homepage Finding
6	The Research Of Scene Related Entity Reasoning Based On Word Frequency
7	Entity Extraction and Disambiguation in Short Text Using Wikipedia and Semantic User Profiles
8	Research On Chinese Entity Relation Extraction Based On Schemas And Pre-trained Language Models
9	Research And Implementation On Method Of Entity Linking Baseed On Wikipedia
10	Research And Implementation Of Named Entity Disambiguation Based On Wikipedia