Font Size: a A A

Research Of Related Entity Extraction And Homepages Finding

Posted on:2014-07-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z C PanFull Text:PDF
GTID:2268330398499404Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the Web has formed a large-scalewide range of information resources Library. Entity Extraction, extracts usefulinformation quickly and accurately from the huge amounts of data, is the focus andemphasis of information processing, the Q&A system and the entity retrieval researchat home and abroad, and also to become one of the all previous well-knowninternational evaluation conference TREC main research goals. The core problem tobe solved in the TREC Entity2011Related Entity Finding (REF) track, is to find allrelated entities of source entities by given source entity name and its home pages, thetype of target entity, as well as the nature of their relation, described in free text. Thispaper builds a hierarchical relevance retrieval system model framework for entityextraction. In this model, it includes document retrieval, entity extraction, entityfiltering, entity ranking, home pages finding and support document finding etc. themodel adopts Natural Language Processing technology and Named EntityRecognition method, and finds related entities and its home pages to meet inputqueries. We complete the evaluation task efficiently from the huge amounts of dataand achieve international ranking second result.This paper mainly includes the following several research aspects:1. Propose an algorithm of calculating the scores for entity ranking in entityextraction stage. This method considers entity TF-IDF weight, page rank,confidence between source and target entities and multiple keywords etc,to become the standard calculation formula after the linear weightedcombination, and the scores of candidate entities will be regarded as basisfor entity ranking. The experimental results of TREC Entity2011RelatedEntity Finding task show that the method has a good effect, and the MAPevaluation of the experiment results reaches0.1266. 2. Put forward an improved algorithm based on authority pages andcharacteristics in home pages finding stage. This method retrieves10pages though inputting the target entities in ClueWeb09API and Googlesearch. Finally, it calculates weights according to multiply features suchURL link, page content etc. The candidate homepage with highest weightwas selected to be homepage for target entity.3. Design and implement an entity extraction and home page and supportdocument finding system model based on Named entity recognitionmethod and Entity homepage and support document finding technology,and we achieved international ranking second result in the TREC Entity2011REF task.
Keywords/Search Tags:Entity Extraction, Home Pages Finding, Natural Language Processingtechnology, Named Entity Recgonition Method, TREC Entity2011
PDF Full Text Request
Related items