Font Size: a A A

Semi-supervised Named Entity Recognition

Posted on:2012-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2178330335950554Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is to identify the text refers to the names, places, other proper names, meaningful time, date and number of phrases to be classified. The key processes of named entity recognition to identify the entity boundary and the type of entity (e.g. name, place and organization).The subject of named entity recognition is named entity. Named entities usually consist of 3 categories and 7 small classes.3 categories are:entity class, time and number; 7 subclasses are:place names, time, date, currency, organization names and percentages. Because of the large number of entities, complex structure and different cultural backgrounds the work of named entity recognition has become very complex. Methods used at home and abroad are the main rule-based methods, statistical methods and a combination of both methods. Named Entity Recognition in resolving specific issues can be divided into supervised learning, semi-supervised learning and unsupervised learning.This paper designed and implemented a semi-supervised NER system, the system consists of training and testing sub-subsystems.Training subsystem starts from seed entity, through search engines, identify the candidate entities and noise filtering to generate a list of named entities ar d their types. Noise filter technology will affect the results of named entity recognition. Different language environments and different types of entities will affect the noise filtering algorithm, thus using different noise filtering methods to get better results in different text and for different types of entities. Noise filter can divide into word level noise filtering and the final noise filtering. In word level noise filtering, this paper proposed and implemented lexical features noise filter, information redundancy noise filter and the combination of the two methods. Experiments show that the combination of information redundancy filter and lexical features filter is better than either method alone. After the word level of the noise filter, we also proposed and implemented the statistical semantic noise filter, but because of its running time is too long, it can not be added the noise filter in the iterative process. So we only use it in the final list of the entities.Annotation subsystem based on the generated list and updated ICTCLAS the user dictionary to mark on the document, it improves its performance.
Keywords/Search Tags:named entity, noise filter, semi-supervised, NLP
PDF Full Text Request
Related items