Font Size: a A A

Research Of Chinese Named Entity Recognition Based On Vocabulary Enhancement And Multi-feature

Posted on:2022-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z P XuFull Text:PDF
GTID:2518306572997299Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important part and basic technology of the information extraction system,the goal of named entity recognition task is to extract special meaning or referential words from unstructured text,and the recognition results are used for relation extraction,text summarization and so on.For chinese named entity recognition,the expression of a sentence is character after character.Although some research work has achieved results in chinese word segmentation,word segmentation errors still exist and affect the recognition efficiency of downstream models.Therefore,the chinese named entity recognition model is usually based on the character level,but the model based on the character level ignores the semantic information of chinese vocabulary,and the vocabulary information is of great significance for determining entity boundaries and types.This paper combines Bidirectional Long-Short Term Memory and Convolutional Neural Network to extract features of chinese characters,so that the extracted features have both long-distance dependent location features and local morphological features.Under the premise of no word segmentation,the vocabulary information corresponding to the characters is introduced through string pattern matching,including vocabulary information starting with a character,with a character as the middle part,ending with a character,and a single character into a word,and this paper uses the word frequency weighted average method to extract the vocabulary features corresponding to the characters.This paper uses the pre-trained BERT-wwm model to extract the pretrained features of the character sequence.This paper integrates vocabulary features into character features through gate mechanism to achieve vocabulary enhancement.The character features after vocabulary enhancement and pre-training features are linearly concatenated to form a multi-feature strategy,thereby improving the recognition indicators of the model.This paper conducts experiments on three baseline datasets of chinese named entity recognition,Resume,MSRA,and Weibo,including comparison experiments and ablation experiments.Experimental results show that the chinese named entity recognition model proposed in this paper has improved precision,recall and F1 compared with existing models.
Keywords/Search Tags:Chinese Named Entity Recognition, Vocabulary Enhancement, Gate Mechanism, Multi-feature, BERT-wwm
PDF Full Text Request
Related items