Font Size: a A A

The Research On Chinese Personal Name Recognition Based On Recurrent Neural Networks

Posted on:2017-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:X F XuFull Text:PDF
GTID:2348330488458158Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The task of Chinese personal names recognition is fundamental in the Chinese information processing, whose performance will directly affect the other tasks. Chinese personal names account for a large proportion of the unknown word because of randomness and only if we solve the problem of names recognition firstly, can we solve the problem of unknown words recognition. Therefore, It is significant to solve the problem of Chinese name personal names.The existing Chinese personal names recognition based on statistical methods has the problem of the high complexity of features selection and the participation of manual. In order to solve the problems, the paper proposes a method of Chinese personal names recognition based on recurrent neural network (Recurrent Neural Networks), which only uses word embedding as the feature to reduces the complexity of features selection and the impact on experimental results due to manual intervention. In addition the word embedding which is trained by unannotated Chinese data contains rich semantic information. The model will learn more information when the word embedding is the input of the model.The model has two stages:the construction of the model and the post processing.In the stage of the construction of the model, we focus on the optimization strategy of the word embedding and propose three strategies.(1) Replace the random initial word embedding produced by RNN with the word embedding trained by the word2vec.(2) Unify numeral representation on the corpus which be used to train the word embedding through numerals generalization operation.(3) Integrate feature information into the word embedding through modifying the code of the word2vec.Experimental results show that the results of personal names recognition increase 2.23% on F-score by optimizing word embedding.In the stage of the post processing, we filer candidate names which are identified by Chinese personal names recognition model through using rules set to improve the precision. Besides, some names which are identified in the other position cannot be recognized due to insufficient information, so we recall unrecognized personal names according to globe diffusion operation based on chapter. We use local diffusion operations based on chapter to recall unrecognized personal names since the article use the part of the last name or the name as the name. The results show that the results of personal names recognition increase 4.74% on F-score by rule filtration and diffusion operations.
Keywords/Search Tags:Chinese Personal Name Recognition, Word Embedding, Recurrent Neural Network, Diffusion Operations
PDF Full Text Request
Related items