Font Size: a A A

Chinese Named Entity Recognition Research Based On Discourse

Posted on:2009-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2178360272990332Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition (NER), an essential task for Natural Language Processing (NLP), plays a great role in information extraction, automatic classification, machine translation, question answering system and information retrieval. Internationally, many researchers have done research in this area, and excellent result is achieved. However, because of the particularity of Chinese language, Named Entity Recognition still is a difficult task in Chinese Information Processing (CIP). In order to solve the limitation of sentence-based Chinese Named Entity Recognition research, we propose a discourse-based algorithm for Chinese Named Entity recognition in this paper. And we discuss it as following:Firstly, it introduces the related linguistic knowledge of the three mainly kinds of Named Entity, person names, location names and organization names. It also introduces various Named Entity recognition methods, such as the old rule-based method and the present machine learning method.Secondly, we propose a kind of combination of statistical and rule-based algorithm for Chinese named entity recognition. This paper analysis the characteristics of Chinese named entity, and refer to a method that using a two-order Conditional Random Fields (CRFs) models, and then using the rule set to correct the tagging result after CRFs learning. The good result is accomplished.Thirdly, this paper refers to the basic conception and the theory of discourse. After that, this paper focuses on the research of coreference resolution.Finally, we propose a new discourse-based algorithm for Chinese Named Entity recognition, which analysis and research the named entity on the whole discourse. During the system building, this paper use a modularization method, the modules including the sentence-order Chinese named entity modules, the coreference resolution modules, the organization ellipsis recognition modules, the named entity repetition recognition modules and the evaluation modules. The experimental results show that comparing to the result of sentence-order Chinese Named Entity recognition system, the recognition result of discourse-based Chinese Named Entity recognition system achieved good performance. In the open test, the precision, the recall and F-measure has reached 85.35%, 80.62% and 82.92%, respectively.
Keywords/Search Tags:Named Entity, Discourse, Recognition Algorithm
PDF Full Text Request
Related items