Font Size: a A A

Chinese Nested Named Entity Recognition Research

Posted on:2012-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:C Y FuFull Text:PDF
GTID:2218330368994010Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the internet, the expansion of information in networks forms the unprecedented"Big Bang". The extraction of necessary information and knowledge from large-scale and unstructured text has become a research focus as well as a challenge in natural language processing. As one of the important sub-tasks of information extraction, named entity recognition (NER) aims to identify phrases within sentences and documents that express person names, location names, organization names, data and time, and numeral expressions. Named entity recognition plays a key role in many natural language processing applications such as information retrieval, question answering and machine translation.The thesis introduce entity morphemes into Chinese NER, and thus incorporates entity-internal structural features with contextual information under the framework of machine learning to identify person names, location names and organization names in Chinese texts, especially nested location names and organization names.First, four two-layer models are constructed for Chinese NER based on maximum entropy and conditional random fields (CRFs). The contrastive experiments on different data sets showed that the CRF-based two-layer model is much more appropriate for Chinese nested NER. Moreover, a post-processing method based on mutual information is to revise some possible errors yielded by the CRF-layer model. Our experiments showed that the nested NER performance can be further improved using the post-processing module.Second, simple named entity recognition and nested NER are considered as two separate tasks in the present study. As such, a five-layer sequence labeling scheme is proposed to handle lexical features and phrase structure under the CRF-layer model to further enhance nested named entities recognition.Finally, entity morphemes are introduced into Chinese nested named entity recognition. To approach this, a set of multi-level prefixes and suffixes are extracted from the training data using a logistic transform method of logistic regression models. Based on entity morphemes, a variety of lexical features and entity structural cues can be easily explored for nested NER. Experiments showed that the proposed system is effective for most nested named entities under evaluation.
Keywords/Search Tags:Named entity recognition, nested named entity, maximum entropy, conditional random fields
PDF Full Text Request
Related items