Research Of Related Entity Extraction And Homepages Finding

Posted on:2014-07-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Pan

Full Text:PDF

GTID:2268330398499404

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, the Web has formed a large-scalewide range of information resources Library. Entity Extraction, extracts usefulinformation quickly and accurately from the huge amounts of data, is the focus andemphasis of information processing, the Q&A system and the entity retrieval researchat home and abroad, and also to become one of the all previous well-knowninternational evaluation conference TREC main research goals. The core problem tobe solved in the TREC Entity2011Related Entity Finding (REF) track, is to find allrelated entities of source entities by given source entity name and its home pages, thetype of target entity, as well as the nature of their relation, described in free text. Thispaper builds a hierarchical relevance retrieval system model framework for entityextraction. In this model, it includes document retrieval, entity extraction, entityfiltering, entity ranking, home pages finding and support document finding etc. themodel adopts Natural Language Processing technology and Named EntityRecognition method, and finds related entities and its home pages to meet inputqueries. We complete the evaluation task efficiently from the huge amounts of dataand achieve international ranking second result.This paper mainly includes the following several research aspects:1. Propose an algorithm of calculating the scores for entity ranking in entityextraction stage. This method considers entity TF-IDF weight, page rank,confidence between source and target entities and multiple keywords etc,to become the standard calculation formula after the linear weightedcombination, and the scores of candidate entities will be regarded as basisfor entity ranking. The experimental results of TREC Entity2011RelatedEntity Finding task show that the method has a good effect, and the MAPevaluation of the experiment results reaches0.1266. 2. Put forward an improved algorithm based on authority pages andcharacteristics in home pages finding stage. This method retrieves10pages though inputting the target entities in ClueWeb09API and Googlesearch. Finally, it calculates weights according to multiply features suchURL link, page content etc. The candidate homepage with highest weightwas selected to be homepage for target entity.3. Design and implement an entity extraction and home page and supportdocument finding system model based on Named entity recognitionmethod and Entity homepage and support document finding technology,and we achieved international ranking second result in the TREC Entity2011REF task.

Keywords/Search Tags:

Entity Extraction, Home Pages Finding, Natural Language Processingtechnology, Named Entity Recgonition Method, TREC Entity2011

PDF Full Text Request

Related items

1	English Entity Answer Extraction And Home Find
2	Research Of Named Entity Relation Extraction Method Based On Bootstrapping
3	Study On Related Entity Finding In Web
4	Domain Adaptation Research And Application Of Named Entity Recognition
5	Research And Application Of Domain Oriented Entities And Inter-entity Relations
6	Theory And Key Techniques Of Entity Retrieval
7	Research And Implementation Of Mining Bilingual Named Entities From Large-Scale Web Pages
8	Study On Recognition Of Chinese Agricultural Named Entity With CRF
9	A Study On Chinese Named Entity Recognition
10	Research On Sentence-level Entity Relationship Extraction With Thai Features