Font Size: a A A

Research On Chinese Named Entity Recognition Technology From Sparsely Annotated Data

Posted on:2020-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:L L KongFull Text:PDF
GTID:2428330575959716Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The third wave of Artificial Intelligence is changing the lives of human beings.As a branch of Artificial Intelligence.Natural Language Processing technology can help machines analyze and understand human's language,and is a bridge connecting human language and machines.Named entity recognition is one of the basic techniques of Natural Language Processing.Its performance is crucial for subsequent tasks such as information retrieval,recommendation system,or sentiment analysis,etc.Chinese named entity recognition has attracted much attention in the research of named entity recognition due to the particularity and complexity of Chinese language.The high performance of the model requires a large number of high-quality annotation training sets for model training generalization,and high-quality Chinese annotation data has become one of the biggest bottlenecks affecting the performance of artificial intelligence algorithms due to its high cost.Therefore,the research of Chinese named entity recognition for little labeled data has important practical significance and application value.In this paper,we consider the Chinese named entity recognition in the application scenario of little annotation data as the object.From the aspects of reducing the amount of required annotation data and reducing the cost of unit sample labeling,we use active learning,transfer learning,and a mixture approach of rules and statistics to try to reduce the required labeling cost while make the model achieve a certain performance.The detailed research contents include the following:(1)To avoid the limitations of the uncertainty-based sample selection strategy,we propose a new active learning approach based on uncertainty and representation for Chinese named entity recognition task.And we compare the effect of several different sample strategies in improving the performance of the model.(2)We propose a Chinese named entity recognition method combining pre-training and active learning.we propose the BERT-CRF model combining BERT pre-training language model and conditional random field w-hich has strong feature fusion capability and label constraint for named entity recognition.Then using active learning with BERT-CRF to further reduce the amount of data required for annotation.The effectiveness of the method in reducing the amount of labeled data is proved by comparison experiments.(3)We develop an entity recognition framework without labeled data combining sample characteristics,which combine approaches of statistics and rules to identify Chinese named entities,and finally merge the results of entity extraction.The framework can automatically generate predictive labels to assist in the human-computer interaction annotation process.The practicality of the method to reduce the cost of the same number of labels is proved by experiments.
Keywords/Search Tags:Chinese Named Entity Recognition, Active Learning, Transfer Learning, Bidirectional Encoder Representations from Transformers
PDF Full Text Request
Related items