Font Size: a A A

Chinese Named Entity Recognition Based On Conditional Random Fields

Posted on:2010-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhangFull Text:PDF
GTID:2178360308478782Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER) is a foundamental technology for natual language processing(NLP), and the important foundation of many NLP applications, such as the information extraction(IE), information retrival(IR), machine translation(MT), chunk analysis, question answering system(QA) ect.. The research of NER is of great worth.The NER task aims to recognize entities such as the person names, location names, orgnizition names in the text. We analyze the linguistic characteristics of the three kinds of entities, introduce the main approachs and systems of NER at present in chapter two, described the graph model and conditional radom fields(CRFs) in chapter three, all of them aim to build a solid base for the following work.Usually, NER task can be classified to a sequence labelling problem,and conditional radom fields is a statistical model for sequence labelling, which owns a strong capability of combining all kinds of features and has been successfully used in many NLP applications, until now, the approach of NER task based on CRF model is the most popular way. Selecting appropriate features is a key issue for improving NER performance. Due to too many features existed for NER task, such as positional feature, part-of-speech feature, internal form features, outer guiding features, NE resource lists and their subclass, feature selection becomes an effective way to improve the performance and decrease the training time cost. All kinds of resources exist in the form of lists, whose abundence and precision are of vital impact on the NER performance, in this paper we raise an method to gain these resources automatically from the labeled corpora. This paper aims to evaluate the effectiveness of each feature for CRF-based NER under the character-based and word-based framework, as well as for features combination.Finally, we accomplish an NER system based on the previous experiments of features, performs the NER in an uniform framework,.gives the close track and open track results and their analysis.
Keywords/Search Tags:Named Entity Recognition, Sequence Labelling, Conditional Radom Fields, Feature Selection, Mutual Information
PDF Full Text Request
Related items