Font Size: a A A

The Research Of Conditional Random Fields Based Chinese Named Entity Recognition

Posted on:2011-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:J B ZhangFull Text:PDF
GTID:2178330338489882Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition(NER) is one of the most important problems in nature language processing(NLP) fields, which is also acts as a critical role in many NLP application, such as information extraction, information filtering, information searching, answering system and machine translation, etc. Because of the particularity and complexity of Chinese, the research of Chinese NER is much behind of the research of English NER. In order to develop the relative NLP technology, the research of Chinese NER is of great worth.This paper mainly researches the technology of conditional random fields based Chinese NER including person names, location names and organization names. Because of the characteristic of Chinese, there are many semantic features in Chinese which can be used to help Chinese NER, this paper mines NE indication words according to compare the frequency of the context of particular NE, and mines the NE structure information deeply, and then expend the mined semantic knowledge with Wiki. This paper uses the words features, POS features, tag features and semantic features to build the CRF model, and validate the validity of these features with experiments. At last this paper design and implement a Chinese NER system, which reaches to a high precision according to the experiment on the corpus of January 1998, the F value of person names, location names and organization names reaches to 93.97%,91.49%,84.67% respectively.Besides, it takes much more time to recognize NE of large mounts of data with single machine. This paper bring forward a method using Hadoop Map/Reduce framework to paralleling execution ,according to the experiment, the recognizing time is shorten by 14 times.
Keywords/Search Tags:Chinese named entity recognition, conditional random fields, semantic knowledge, feature, paralleling
PDF Full Text Request
Related items