Font Size: a A A

Weakly Supervised Named Entity Recognition Based On Online Encyclopedia

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:M L LiFull Text:PDF
GTID:2428330605974773Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)aims at recognizing and classifying phrases refer-ring to a set of named entity types,such as PERSON,ORGANIZATION,and LOCATION in text.As a core task of Natural Language Processing(NLP),NER is crucial to various popular applications including information extraction,question answering,and knowledge graph construction,etc.In recent years,with the development of deep learning and the ex-plosive growth of data,the method using deep neural networks for named entity recognition has achieved good results and has gradually become the mainstream.However,most deep learning models are data-driven supervised that require a large amount of manually labeled data,which are extremely costly.Therefore,it is of great practical significance to study how to reduce the cost of acquiring labeled corpus for entity recognition.Weakly supervised learning is a learning method suitable for scenarios that contain a small amount of labeled data or the labeled data with noises.Many researchers try to apply weakly supervised learning methods to entity recognition,but existing weakly supervised entity recognition methods still have some shortcomings,such as crowdsourcing only fo-cusing on data quality,and poor selection strategies for active learning.To tackle with this problem,our paper develops weakly supervised named entity recognition method based on online encyclopedia,as follows:(1)This paper proposes an entity recognition method combining active learning and crowdsourcing.This method effectively reduces the number of labeled corpora and saves the labeling costs without damaging the performance of the NER model.(2)This paper uses Wikipedias to generate entity recognition corpus.First,we trained an entity classifier to classify Wikipedia pages into entity types.Next,the Wiki pages corresponding to the linked texts are classified into entity types,and then the texts with inner-links are converted into entity recognition corpus.(3)This paper also proposes a weakly supervised entity recognition method combining data enhancement and self-learning.First,a data augmentation method is conducted on a small amount of entity recognition corpus to expand the training set.Then,a self-learning method is applied to the model,which is trained on the expanded training set,to further improve the model performance.This method effectively alleviates the problem of inadequate training data and has practical application value.We performe experiments on several real datasets.The results show that the weakly supervised learning method proposed in this paper can significantly reduce the number of labeled corpora for entity recognition models,and the data augmentation method proposed in this paper can effectively expand the data size of the corpus.
Keywords/Search Tags:named entity recognition, weakly supervised, active learning, crowdsourcing, data augmentation
PDF Full Text Request
Related items