Named Entity Recognition Method For Labeling Scarce Problem

Posted on:2024-07-20

Degree:Master

Type:Thesis

Country:China

Candidate:J T Guo

Full Text:PDF

GTID:2568307115464084

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Named entity recognition plays an important role in natural language processing applications,which could provide basic support and optimization solutions for many natural language processing tasks.Some advanced models and algorithms require a large amount of data for training,while real datasets are not easy to obtain,especially in some particular fields,the annotation for datasets consumed a lot of human and financial resources.Currently,data annotation is expensive,and active learning can greatly reduce the amount of data annotation.Therefore,designing named entity recognition methods in annotation scarce areas based on active learning technology has great application value and significance.This thesis focuses on active learning of named entity recognition.In the situation of annotation data scarcity,it aims to minimize manual annotation costs while ensuring performance.The research work of this thesis covers the following two contents:(1)For the problem that Transformer’s fully connected structure is highly dependent on labeled data,a lattice named entity recognition method based on global nodes and multiple fragments is proposed.Firstly,in order to reduce the annotation cost while maintaining accuracy,the structure based on global node and multi segment is created to replace the fully connected structure of Transformer in the FLAT model to reduce the requirement for annotation data.Then,combining the current structure with the idea of FLAT lattice can not only avoid word segmentation,but also effectively utilize vocabulary boundary information.The evaluation results on four named entities datasets,MSRA,Onto Notes5,Weibo,and People Daily,show that the proposed lattice method based on global nodes and multiple fragments reduced the amount of annotation data requiring by39.9%,2.17%,34.6%,and 35.67% compared to the FLAT model,respectively.(2)To further improve the utilization of labeled data,a named entity recognition method combining active learning and data expansion is proposed.Firstly,after selecting high value samples through active learning strategies,entities in the samples are randomly replaced with entities of the same type to achieve the expansion of high value samples.Then,in order to further utilize the output parameters of the deep learning model in active learning strategies,a sentence margin strategy is proposed.The overall score of unlabeled samples is calculated using each span probability and the transition matrix of the CRF layer,and the maximum difference between the two prediction series scores is taken as a criterion for selecting unlabeled samples.The evaluation results on four NER datasets,including MSRA,Ononotes5,Weibo,and People Daily,show that the proposed named entity recognition method combined with active learning and data expansion achieves an F1 score of 99.1%,95.9%,98.9%,and 99.2% of the original model using only 38% of the datasets,respectively,verifying the effectiveness of the active learning strategy and data expansion method.

Keywords/Search Tags:

Global node, Multiple segment, Lattice, Named entity recognition, Active learning

PDF Full Text Request

Related items

1	Chinese Named Entity Recognition Based On Neural Network
2	The Research Of Weibo Entity Recognition Model Based On Active Learning
3	Named Entity Recognition On Global Search
4	Research On Tibetan Named Entity Recognition Model Based On Active Learning
5	Research On Chinese Named Entity Recognition Technology From Sparsely Annotated Data
6	Research On Chinese Named Entity Recognition Based On Deep Learning
7	Research On Named Entity Recognition Based On Global Information
8	Research And Implementation Of Chinese Named Entity Recognition Based On Lattice-LSTM Model
9	Research And Application Of Chinese Named Entity Recognition
10	Research On Chinese Named Entity Recognition Method Based On Multi-Granularity Feature Fusion