Research On The Named Entity Recognition In The Domain Of Lack Of Annotated Data

Posted on:2016-01-28

Degree:Master

Type:Thesis

Country:China

Candidate:C Q Duan

Full Text:PDF

GTID:2308330479991074

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The main task of named entities recognition is to identify the proper nouns, including person, organization, location names and so on, then, to classify them into correct category. In recent years, the method based on statistics has become the mainstream in the task of named entities recognition. The basic idea of this method is that it utilize large scale corpora to obtain knowledge, building statistical model. Contributing to sufficient corpora in news field, named entity recognition in this field has achieved good performance. However, in non-news field, especially in the field where it lacks labeled corpora, the performance of named entity recognition is still poor. In view of this, semi-supervised learning is usually used to improve the performance of the model in a specific field.In this paper, we focus on how to efficiently use the conditional random fields model to obtain the knowledge from a specific field in the task of named entities recognition so that we can improve the performance. Firstly, we attempts to fuse the partially annotated data derived from target domain into the training corpora in order to improve the performance of the model in a specific field. The construction of the human-annotated data in a specific field is difficult, but the acquisition of the partially annotated data is much easier. At the same time, the partially annotated data not only carry the entity information in the target domain, but also carry the syntactic structure information. So, through the way that we train a model with partially annotated data from target domain, the model will performance well in the specific field. We have proof this by the experiments using data derived from novel. Secondly, we proposed a method to improve the conditional random fields model which makes it not only can use the Scalar features, but also can use the real valued feature. Through this method, we try to apply the word embedding to the task of named entity recognition. Word embedding comes from deep learning. It contains syntactic and semantic information, and it is domain-independent. So it’s a general feature. Our result show that word embedding help the model performance well in a specific field. Finally, we use a semi-supervised learning which is called bootstrapping to train the conditional random fields model and it is proved that this method further improve the experimental results.

Keywords/Search Tags:

named entity recognition, domain adaption, conditional random fields, partially annotated data, word embedding

PDF Full Text Request

Related items

1	Research On Boosting Chinese Word Segmentation Accuracy With Partially Annotated Data
2	Research Of Named Entity Recognition Based On Conditional Random Fields
3	Study On The Tibetan Word Segmentation And Named Entity Recognition With Conditional Random Fields
4	Application Research On Chinese Named Entity Recognition Based On Domain Ontology
5	Recognition Of Named Entity In Electronic Medical Records Based On Cascaded Conditional Random Fields
6	Chinese Named Entity Recognition Based On Conditional Random Fields
7	Named Entity Recognition Based On Conditional Random Fields
8	Research On Named Entity Extraction Method For Symptom Phenotype
9	Named Entity Recognition Based On Conditional Random Fields Chinese Research
10	The Research Of Conditional Random Fields Based Chinese Named Entity Recognition