Font Size: a A A

CRFs-Based Chinese Named Enitity Recognition With Improved Tag Set

Posted on:2010-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:G M CengFull Text:PDF
GTID:2178360278465689Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition (NER) is one of the most difficult works in NLP tasks, and it acts as a critical role in some language processing applications, such as Information Extraction, Text Classification etc. Many efforts have been paid on the NER tasks, especially Chinese NER. Different from English, there is no space to mark word boundary between Chinese, which makes Chinese NER become a more difficult task. Many methods have been presented, CRFs come to good performance in the former research. Most of the related works focus on CRFs feature selection, using complex CRFs feature templates which cost a lot of system memory and need long time to deal with the training dataIn this paper, we focus on improving the efficiency of Chinese NER system. We built up a two step system under the CRFs model. First step we use a CRFs model to recognize NEs, in which we an improved tag set to make the process more efficient. Second step we use some post process to improve the accuracy. The post process includes TBL and rules-based method. From our research, we can learn that by using the five-tags set and template-3 can get a higher precision than four-tags set and template-5.Though the recall value is not as good as the template-5, the F-value which represents the system performance is very close. Obviously, the five-tags set and template-3 uses less system resources and costs less time for training. So we can learn that, even you just use a simple CRF template, you can achieve the same system performance when you find a tag set matches this template. Our system gets an F-value of 93.49 and using less system resources.Finally we give out our analysis about the experiment, and some comments about future works are made.
Keywords/Search Tags:NER, CRFs, feature-template, tag-set
PDF Full Text Request
Related items