Font Size: a A A

Research On The Named Entity Recognition In The Domain Of Lack Of Annotated Data

Posted on:2016-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:C Q DuanFull Text:PDF
GTID:2308330479991074Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The main task of named entities recognition is to identify the proper nouns, including person, organization, location names and so on, then, to classify them into correct category. In recent years, the method based on statistics has become the mainstream in the task of named entities recognition. The basic idea of this method is that it utilize large scale corpora to obtain knowledge, building statistical model. Contributing to sufficient corpora in news field, named entity recognition in this field has achieved good performance. However, in non-news field, especially in the field where it lacks labeled corpora, the performance of named entity recognition is still poor. In view of this, semi-supervised learning is usually used to improve the performance of the model in a specific field.In this paper, we focus on how to efficiently use the conditional random fields model to obtain the knowledge from a specific field in the task of named entities recognition so that we can improve the performance. Firstly, we attempts to fuse the partially annotated data derived from target domain into the training corpora in order to improve the performance of the model in a specific field. The construction of the human-annotated data in a specific field is difficult, but the acquisition of the partially annotated data is much easier. At the same time, the partially annotated data not only carry the entity information in the target domain, but also carry the syntactic structure information. So, through the way that we train a model with partially annotated data from target domain, the model will performance well in the specific field. We have proof this by the experiments using data derived from novel. Secondly, we proposed a method to improve the conditional random fields model which makes it not only can use the Scalar features, but also can use the real valued feature. Through this method, we try to apply the word embedding to the task of named entity recognition. Word embedding comes from deep learning. It contains syntactic and semantic information, and it is domain-independent. So it’s a general feature. Our result show that word embedding help the model performance well in a specific field. Finally, we use a semi-supervised learning which is called bootstrapping to train the conditional random fields model and it is proved that this method further improve the experimental results.
Keywords/Search Tags:named entity recognition, domain adaption, conditional random fields, partially annotated data, word embedding
PDF Full Text Request
Related items