Font Size: a A A

Research On Several Key Issues On TAC-KBP Evaluation

Posted on:2012-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Y GaoFull Text:PDF
GTID:2178330335960476Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
A new era called information explosion is coming as web2.0 technology. In face of the overwhelming Internet information, user usually needs structured or semi-structured information about the query, rather than a large number of web pages returned by search engines. As one of the core technologies in natural language processing area, entity extraction and relation extraction not only supply specific information needs to user, but also promote the development of related technologies. This paper studies the topic about entity attribute extraction based on TAC-KBP evaluation, which includes three aspects:entity linking, entity clustering and entity attribute extraction. The main innovations of this thesis are stated below:Firstly, four models are proposed for entity linking task:retrieval model, classification model, coreference resolution model and rule-based model. The retrieval model emphasis the role of sorting; the classification model focuses on the use of text classification; the coreference resolution model thinks of the entity linking task as a cross-document coreference resolution task; rule-based model makes different rules for entities of different types. While being tested on TAC2010-KBP dataset, the coreference resolution model and the rule-based model improves at least 10% on F score than the retrieval model.Secondly, bag of words model and strong feature model are proposed for entity clustering task. And a two-stage model based on bootstraping algorithm is referenced. While being tested on TAC2010-KBP dataset, it is demonstrated that the strong feature model improves the precision effectively and the two-stage model can contribute to the high recall. The F score of the two models is at least 20% higher than the bag of words model.Thirdly, pattern rule based model and mechine learning based model are used for entity attribute extraction task. The pattern rule based model extracts the entity attributes with the predefined regular expressions. The mechine learning based model extracts the entity attributes with the model trained with CRFs algorithms. The combination of the two models performs well in TAC2010-KBP evaluation.
Keywords/Search Tags:classification, feature selection, entity linking, entity clustering, entity attribute extraction
PDF Full Text Request
Related items