Font Size: a A A

The Study Of Recognition And Classification Of Entity For Question Answering System

Posted on:2010-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhouFull Text:PDF
GTID:2178360272985240Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Question Answering System (QAS) is both high-level form of information retrieval and research emphasis and hotspot in this field. It is an integrated technical processing system which is supported by the technique of word segment, lexical analysis, retrieval, entity recognition and answer extraction. One of the key technique of QAS is entity recognition and tagging which directly decide the judgement of question type and the extraction of answers. The judgement of question type is very related to entity recognition in QAS. According to the detail demands, question type has different classifications. In general, the factoid questions are always related to the person, time, place, quantity and etc. But for some type there are different levels, for example a place can be subdivided into country, province, city, mountain, river, lake and so on. The paper emphasizes entity recognition and classification method by the levels and diversity of the entity. Main work is1. This thesis proposes a method for entity recognition and classification based on rules and statistic, which is mainly combined the entity classified dictionary and conditional random fields(CRFs), through the research on question classification of QAS of universal field and the entity classified system.2. This thesis does research on logging words recognition based on the entity classfied dictionary, establishes a entity classfied dictionary for QAS which includes almost 300 thousand words by the ways of the open database of the Chinese Wikipedia and etc, the dictionary is stored in memory as the form of index tree, then use this dictionary to goes the recognition and classification of logging words, improves the effect of the entity recognition and classification.3. This thesis provides a two-stages method for named entity recognition based on CRFs, the F-score of two-stages is 86.30% in close test, which is lower 1.5% than the one-stage result 88.01%, however, it can reduce the temporal complexity to 20%.4. This theis makes a further research on the recognition of the Chinese organization name based on CRFs. In feature extraction, it inosculates the philology features and the word concept features effectively, and does a comparative experiment between character-based and word-based model under situation of choosing different features. After analyzed all experimental results, it shows the results of two models has commutative discrepancy, the inosculated model has been proved better than single model.This thesis did primary attempt of the entity recognition and classification with features of multi-types and hiberarchy, and obtains some achievement. Believing with the development of Chinese named entity recognition technology, the performance of the entity recognition and classification will achieve greater improvement.
Keywords/Search Tags:Question answering system, entity recognition and classification, classified dictionary, conditional random fields, recognition of organization name, character-based model
PDF Full Text Request
Related items