Font Size: a A A

Research On Named Entity Recognition Method And Application Based On Transformer

Posted on:2022-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z H HuangFull Text:PDF
GTID:2518306575466604Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Named entity recognition is an essential part of many data analysis tasks.At present,there are two main tasks for named entity identification: one is the recognition of conventional entities,the other is the recognition of nested entities.Although named entity recognition based on deep learning can recognize entities to a certain extent,it still has the defect that it can't recognize new entities' boundaries and many familiar entities nested each year.Transformer has more advantages than traditional feature extractors in processing semantic information feature extraction.It can comprehensively consider local and global features to obtain rich semantic information in the text and effectively identify entities' boundaries.Therefore,this thesis combines Transformer theory to research on named entity recognition methods.The main work of this study is shown below.1.Combined with Transformer theory,this thesis proposes a method through feature fusion on multiple granularities base on Transformer.Firstly,add the stroke writing order feature to the Cw2 Vec vector as a spatial vector representation of the text,enriching the features and avoiding the OOV(Out-Of-Word)problem.Secondly,Transformer is used to extract the semantic features of character level in multiple independent spaces,and then fuse the extracted features with global features to enrich the feature information that the model can learn;Finally,a denoising algorithm based on the Attention mechanism is proposed.During the model's training process,the noise data is identified and dynamically removed,and the learning direction of the model is corrected.The experimental evaluations show that the proposed method in this thesis can effectively improve the recognition effect of named entities.2.Considering that the BERT pre-training model is trained on the character level,and each Chinese character has different meanings in different words,using the pre-training model as a word vector embedding is easy to miss the word information features.To integrate the semantic information of words into the BERT pre-training model,this thesis proposes a Watransformer(word-aligned Transformer)algorithm to expand the BERT pre-training model and improve the overall effect of nested entity recognition.First of all,to improve the accuracy of the word segmentation results as much as possible,it is necessary to merge the unrecognized nested entities' sub-body bodies.In addition to the traditional named entity recognition model,this thesis also introduces a sliding window.If the distance between the two sub-entities is within the sliding window,the two sub-entities are merged into a new entity,and the final result is combined with the result of the word segmentation tool to obtain the optimized word segmentation result.Secondly,the word vector obtained by the BERT model then inputs into the Transformer to obtain the attention matrix of each character to the remaining words.Use the optimized word segmentation results to divide the attention matrix on word levels,and do a Meanpooling of the attention weight of each word set;then,according to the calculation method of the Attention mechanism,the semantic information of the word is fused with the semantic information of the character,and the final word vector representation is obtained.The experimental results show that the proposed method in this thesis can effectively improve the recognition effect of named entities,which is significant compared with other NER models.
Keywords/Search Tags:named entities, nested entities, transformer, deep learning, multiple granularity
PDF Full Text Request
Related items