Research On Tibetan Named Entity Recognition Based On Weakly Supervised Learning

Posted on:2021-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:P Sun

Full Text:PDF

GTID:2438330602498432

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Named Entity Recognition(NER)is one of the basic and key tasks of Tibetan information processing.Tibetan NER is to find and classify named entities from Tibetan text,and its result will affect the performance of subsequent tasks such as Tibetan information extraction and information retrieval.Currently,Tibetan NER is mainly based on supervised statistical machine learning methods.Traditional feature engineering relies on the knowledge and experience of linguists to extract the shallow statistical characteristics of named entities.It is difficult to represent the semantic information of named entities.However,expanding the size of training corpora faces the problem of high cost of manually labeling corpora.Therefore,it is of great research value to build a high-performance Tibetan NER model based on small-scale labeled corpora.This paper studies Tibetan NER based on weakly supervised learning and main tasks are as follows:Learn the distributed representation of words by unlabeled text,construct word representation features to represent semantic information and append to the statistical machine learning model for Tibetan name recognition,the performance of model is improved.In this paper,four kinds of word representation features:word embedding feature,binarized word embedding feature,word embedding clustering feature,and Brown clustering feature are studied,and a weakly supervised Tibetan name recognition model is constructed by Conditional Random Fields model.Aiming at the situation that the word embedding feature and binarized word embedding feature fail in some NER systems reflected in related research,a novel sampling strategy for word representation features is proposed.Experiments show that the word representation features can effectively represent the semantic information of the name entity,and the F1 score of the supervised statistical model is increased from 88.66%to 91.90%.The sampling of word representation features can make better use of word embedding feature and binarized word embedding feature,and reduce the training time of the model by about 90%and 50%,respectively.By using a combination of active learning and self-learning,a weakly supervised Tibetan NER learning model based on unlabeled corpora and small-scale labeled corpora is used to reduce the cost of labeling corpora.This paper studies three active learning sampling strategies such as Least Confidence,Maximum Normalized Log-probability,and Content Similarity,and implements active learning-based Tibetan NER model.Then,integrate self-learning sampling strategy based on confidence-based into active learning models,a weakly supervised Tibetan NER model combining active learning and self-learning is constructed.Experiments show that compared with a supervised statistical machine learning model for Tibetan NER,without losing the performance of above model,the active learning method reduces the amount of labeled training corpora by about 74%,combining active learning and self-learning methods can reduce about 77%of the amount of labeled training corpora.Therefore,the combination of active learning and self-learning can reduce the cost of labeling training corpora,and has certain advantages over active learning methods.

Keywords/Search Tags:

Tibetan Named Entity Recognition, Weakly Supervised Learning, Word Representation Feature, Combining Active Learning and Self-learning

PDF Full Text Request

Related items

1	Research On Tibetan Named Entity Recognition Model Based On Active Learning
2	Weakly Supervised Named Entity Recognition Based On Online Encyclopedia
3	Research Of Word Representations On Biomedical Named Entity Recognition
4	Complex Chinese Named Entity Recognition In Finance
5	Research On Tibetan Named Entity Recognition Based On Deep Learning
6	Research On Key Technologies Of Named Entity Recognition And Linking Based On Representation Learning
7	Image Data Annotation And Recognition Based On Weakly Supervised Deep Learning
8	Research On Entity Recognition Technology For Knowledge Base Construction In Requirement Engineering Domain
9	A Research On Weakly Supervised Relation Extraction
10	The Research Of Weibo Entity Recognition Model Based On Active Learning