Semi-supervised Named Entity Recognition

Posted on:2012-12-11

Degree:Master

Type:Thesis

Country:China

Candidate:H Chen

Full Text:PDF

GTID:2178330335950554

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Named entity recognition is to identify the text refers to the names, places, other proper names, meaningful time, date and number of phrases to be classified. The key processes of named entity recognition to identify the entity boundary and the type of entity (e.g. name, place and organization).The subject of named entity recognition is named entity. Named entities usually consist of 3 categories and 7 small classes.3 categories are:entity class, time and number; 7 subclasses are:place names, time, date, currency, organization names and percentages. Because of the large number of entities, complex structure and different cultural backgrounds the work of named entity recognition has become very complex. Methods used at home and abroad are the main rule-based methods, statistical methods and a combination of both methods. Named Entity Recognition in resolving specific issues can be divided into supervised learning, semi-supervised learning and unsupervised learning.This paper designed and implemented a semi-supervised NER system, the system consists of training and testing sub-subsystems.Training subsystem starts from seed entity, through search engines, identify the candidate entities and noise filtering to generate a list of named entities ar d their types. Noise filter technology will affect the results of named entity recognition. Different language environments and different types of entities will affect the noise filtering algorithm, thus using different noise filtering methods to get better results in different text and for different types of entities. Noise filter can divide into word level noise filtering and the final noise filtering. In word level noise filtering, this paper proposed and implemented lexical features noise filter, information redundancy noise filter and the combination of the two methods. Experiments show that the combination of information redundancy filter and lexical features filter is better than either method alone. After the word level of the noise filter, we also proposed and implemented the statistical semantic noise filter, but because of its running time is too long, it can not be added the noise filter in the iterative process. So we only use it in the final list of the entities.Annotation subsystem based on the generated list and updated ICTCLAS the user dictionary to mark on the document, it improves its performance.

Keywords/Search Tags:

named entity, noise filter, semi-supervised, NLP

PDF Full Text Request

Related items

1	Semi-supervised Based Mobile Phone Named Entity Recognition
2	Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervisio
3	Semi-Supervised Named Entity Recognition Based On Deep Learning
4	Word Segmentation And Named Entity Mining Based On Semi Supervised Learning For Chinese EMR
5	Research Of Word Representations On Biomedical Named Entity Recognition
6	Laos Named Entity Recognition Research
7	Knowledge Mining Based On Statistical Snowball Models
8	Research On Chinese Electronic Medical Record Entities Recognition And Entity Relation Extraction Based On Semi-Supervised Learning
9	Research On The Method Of Identifying Anonymous Names In Laos
10	Semi-Supervised Disentangled Transfer Algorithm On Named Entity Recognition