Font Size: a A A

Design And Impementation Of A Small-Sample Named Entity Recognition Method In The Field Of Operation A Maintenance

Posted on:2023-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2568306914963379Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the introduction of deep learning methods,the performance of named entity recognition in the general domain has been close to human.However,the performance in specific domains(such as operations and maintenance)has yet to be improved.Therefore,this thesis mainly studies the method of applying deep learning to solve the problem of named entity recognition in small sample scenarios.Based on the above discussion,this thesis selects the BERT-CRF method as the baseline model,and starts from the following four aspects to improve the effect of the named entity recognition model in the small sample scenario in the field of operation and maintenance.Firstly,this thesis proposes an abstract label named entity recognition model based on prior knowledge.By simplifying the transition relationship between labels in the transition probability matrix of the conditional random field,the impact of the difference in the number of samples between different kinds of entities is reduced,and the learning space is reduced.A label weight vector method is also proposed to increase the discrimination between different labeling paths.In addition,a method of prior knowledge mask matrix is also proposed,which uses the prior knowledge of the transfer between labels in the named entity recognition task to provide a better training starting point,avoid invalid learning,and finally make the model F1 Value increased by 8%.Then this paper proposes a data enhancement method based on self-training.This method labels the unlabeled data through the teacher model,selects the high-quality labeling results to add to the training set,and provides it to the student model for training.At the same time,the two models share the parameters of the word embedding layer and the conditional random field in the decoding layer in this process.In addition,a word boundary scoring method is designed.Through multiple rounds of training and selective admission,the domain knowledge contained in the unlabeled data set is better utilized,and the model F1 value is increased by 4%..Thirdly,according to the aggregation effect of entity tags in named entity recognition,this paper proposes a named entity recognition framework based on multitask joint learning.By adding an entity and non-entity binary classification task in addition to the sequence tagging task,and sharing the word embedding layer.This method helps accelerates the convergence speed of non-entity learning and helps to divide the entity boundary.At the same time,the label smoothing mechanism is used for the loss function to avoid over-fitting problems between entities and non-entities,and finally the F1 value of the model is increased by 3%.Finally,combining the above methods,a named entity recognition model MSTBERT-CDT++ in the field of operation and maintenance is proposed,which improves the F1 value of the baseline model by 15%.At the same time,for the purpose of data increment and model iteration,a small-sample incremental named entity recognition system is designed and implemented based on the above model.It contains model management,data management,analysis and modeling task management functions,and provides the ability to convert multi-format documents in the field of operation and maintenance.
Keywords/Search Tags:named entity recognition, abstract label, prior knowledge mask matrix, self-training, multi-task learning
PDF Full Text Request
Related items