Font Size: a A A

Named Entity Recognition Method For Labeling Scarce Problem

Posted on:2024-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:J T GuoFull Text:PDF
GTID:2568307115464084Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognition plays an important role in natural language processing applications,which could provide basic support and optimization solutions for many natural language processing tasks.Some advanced models and algorithms require a large amount of data for training,while real datasets are not easy to obtain,especially in some particular fields,the annotation for datasets consumed a lot of human and financial resources.Currently,data annotation is expensive,and active learning can greatly reduce the amount of data annotation.Therefore,designing named entity recognition methods in annotation scarce areas based on active learning technology has great application value and significance.This thesis focuses on active learning of named entity recognition.In the situation of annotation data scarcity,it aims to minimize manual annotation costs while ensuring performance.The research work of this thesis covers the following two contents:(1)For the problem that Transformer’s fully connected structure is highly dependent on labeled data,a lattice named entity recognition method based on global nodes and multiple fragments is proposed.Firstly,in order to reduce the annotation cost while maintaining accuracy,the structure based on global node and multi segment is created to replace the fully connected structure of Transformer in the FLAT model to reduce the requirement for annotation data.Then,combining the current structure with the idea of FLAT lattice can not only avoid word segmentation,but also effectively utilize vocabulary boundary information.The evaluation results on four named entities datasets,MSRA,Onto Notes5,Weibo,and People Daily,show that the proposed lattice method based on global nodes and multiple fragments reduced the amount of annotation data requiring by39.9%,2.17%,34.6%,and 35.67% compared to the FLAT model,respectively.(2)To further improve the utilization of labeled data,a named entity recognition method combining active learning and data expansion is proposed.Firstly,after selecting high value samples through active learning strategies,entities in the samples are randomly replaced with entities of the same type to achieve the expansion of high value samples.Then,in order to further utilize the output parameters of the deep learning model in active learning strategies,a sentence margin strategy is proposed.The overall score of unlabeled samples is calculated using each span probability and the transition matrix of the CRF layer,and the maximum difference between the two prediction series scores is taken as a criterion for selecting unlabeled samples.The evaluation results on four NER datasets,including MSRA,Ononotes5,Weibo,and People Daily,show that the proposed named entity recognition method combined with active learning and data expansion achieves an F1 score of 99.1%,95.9%,98.9%,and 99.2% of the original model using only 38% of the datasets,respectively,verifying the effectiveness of the active learning strategy and data expansion method.
Keywords/Search Tags:Global node, Multiple segment, Lattice, Named entity recognition, Active learning
PDF Full Text Request
Related items