Design And Impementation Of A Small-Sample Named Entity Recognition Method In The Field Of Operation A Maintenance

Posted on:2023-09-12

Degree:Master

Type:Thesis

Country:China

Candidate:C Wang

Full Text:PDF

GTID:2568306914963379

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Since the introduction of deep learning methods,the performance of named entity recognition in the general domain has been close to human.However,the performance in specific domains(such as operations and maintenance)has yet to be improved.Therefore,this thesis mainly studies the method of applying deep learning to solve the problem of named entity recognition in small sample scenarios.Based on the above discussion,this thesis selects the BERT-CRF method as the baseline model,and starts from the following four aspects to improve the effect of the named entity recognition model in the small sample scenario in the field of operation and maintenance.Firstly,this thesis proposes an abstract label named entity recognition model based on prior knowledge.By simplifying the transition relationship between labels in the transition probability matrix of the conditional random field,the impact of the difference in the number of samples between different kinds of entities is reduced,and the learning space is reduced.A label weight vector method is also proposed to increase the discrimination between different labeling paths.In addition,a method of prior knowledge mask matrix is also proposed,which uses the prior knowledge of the transfer between labels in the named entity recognition task to provide a better training starting point,avoid invalid learning,and finally make the model F1 Value increased by 8%.Then this paper proposes a data enhancement method based on self-training.This method labels the unlabeled data through the teacher model,selects the high-quality labeling results to add to the training set,and provides it to the student model for training.At the same time,the two models share the parameters of the word embedding layer and the conditional random field in the decoding layer in this process.In addition,a word boundary scoring method is designed.Through multiple rounds of training and selective admission,the domain knowledge contained in the unlabeled data set is better utilized,and the model F1 value is increased by 4%..Thirdly,according to the aggregation effect of entity tags in named entity recognition,this paper proposes a named entity recognition framework based on multitask joint learning.By adding an entity and non-entity binary classification task in addition to the sequence tagging task,and sharing the word embedding layer.This method helps accelerates the convergence speed of non-entity learning and helps to divide the entity boundary.At the same time,the label smoothing mechanism is used for the loss function to avoid over-fitting problems between entities and non-entities,and finally the F1 value of the model is increased by 3%.Finally,combining the above methods,a named entity recognition model MSTBERT-CDT++ in the field of operation and maintenance is proposed,which improves the F1 value of the baseline model by 15%.At the same time,for the purpose of data increment and model iteration,a small-sample incremental named entity recognition system is designed and implemented based on the above model.It contains model management,data management,analysis and modeling task management functions,and provides the ability to convert multi-format documents in the field of operation and maintenance.

Keywords/Search Tags:

named entity recognition, abstract label, prior knowledge mask matrix, self-training, multi-task learning

PDF Full Text Request

Related items

1	Research On Multi-Label Text Classification And Unified Named Entity Recognition Under The Background Of Public Opinion Analysis
2	Chinese Named Entity Recognition Technology For Enterprise Knowledge Graph Construction
3	Research On Key Technologies Of Constructing Domain Knowledge Map
4	Named Entity Recognition With Multi-Grained Representation Learning
5	Research On Chinese Named Entity Recognition Method And Its Application In The Filed Of Administrative Work Report
6	Multi-task Learning For Chinese Named Entity Recognition
7	Research On Cross-domain Named Entity Recognition Method
8	Candidate Region Aware Nested Named Entity Recognition
9	Research On Named Entity Recognition Algorithm And Its Implement In Specific Fields
10	Research And Implementation Of Named Entity Recognition Based On Deep Learning