Font Size: a A A

Chinese Named Entity Recognition Based On Semantic Vectors Integration

Posted on:2020-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:M CuiFull Text:PDF
GTID:2428330575489048Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,people's life is full of massive text data.The task of named entity recognition is to extract the entities with specific meaning in the text.As a key task in natural language processing,it is the basis of public opinion monitoring and information extraction technology,and its effect has a great impact on subsequent task research.The task of named entity recognition introduced in this paper is to extract the names of people,places and institutions in the text.Due to the high cost and poor generalization of tradition methods,this paper adopts a method based on semantic vectors integration to identify chinese entities.Firstly,the BiLSTMs were used for modeling.The BiLSTMs+CRF model was constructed by introducing the Conditional Random Field(CRF)rule to improve the recognition efficiency.Input in words,solves the problem that the Chinese word segmentation error leads to poor recognition.In the model,the word vector and word vector of the pre-trained model are spliced to obtain a new set of semantic vectors integration.CNN-BiLSTM+CRF model is then constructed.in which the Convolutional Neural Network(CNN)is used to extract fine-grained features.In order to solve the disadvantage of the slow computation speed of the sequence model,the spliced word vectors and word vectors were input into the constructed Iteration Dilated Convolution Neural Network(ID-CNN).No parameters were added in the model,but it makes the covered text and extracted features more available.In BiLSTM.BiLSTM+CRF,BiLSTMs,BiLSTMs+CRF models.it is found through experiments that the word vector and word vector spliced together are respectively better than the single effect,F1 can reach up to 89.64%after semantic vector fusion.In the CNN-BiLSTM+CRF model,F1 reaches 90.08%.and the value of F1 in the model id-cnn is 89.22%.The training speed is only one third of the above sequence model.Although the value of ID-CNN model F1 is not the highest,it also proves the validity of the model in named entity recognition.Finally,F1 was upgraded to 90.31%through model integration.
Keywords/Search Tags:Named Entity Recognition, Bi-diectionoal Long Short-Term Memory, Conditional Random Fields, Dilated Convolution, Ensemble
PDF Full Text Request
Related items