Font Size: a A A

Research On Named Entity Recognition For Chinese Text

Posted on:2020-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z JiaFull Text:PDF
GTID:2428330623956674Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named Entity Recognition is an important basic task in natural language processing.It aims at recognizing entity nouns with specific meanings automatically in text,usually including name of person,location and organization.It can transform unstructured data into structured data and make computers understand text information like human beings.It has important practical significance for knowledge map,question answering system,search engine and other application fields.However,in the field of Chinese named entity recognition,due to the feature of Chinese itself,there are still many difficulties including:(1)There is no boundary between words in Chinese text,so the recognition effect of named entity depends heavily on the accuracy of word segmentation.(2)Chinese named entities lack obvious morphological features,such as capitalization,prefix and suffix in English words.(3)The massive multi-dimensional and cross-domain Chinese text data poses a greater challenge to the accuracy of named entity recognition.(4)There are many abbreviations,mixed use of Chinese and English,and nested entities in Chinese named entities,which make the recognition of named entities more complex.In view of the difficulties in Chinese named entity recognition,a lot of research on the development of named entity recognition technology at home and abroad has been reviewed in this thesis.After a detailed analysis of the mainstream statistical methods,combining with the characteristics and development trend of deep learning technology,this thesis points out the improvement direction of using deep learning and statistical methods to construct a hybrid model to solve the problem of Chinese named entity recognition.The research contents of this thesis include the following two aspects:(1)A Chinese named entity recognition method based on multi-source embedding and hybrid model is proposed.In order to solve the problem of ambiguous boundaries of Chinese words,a recognition method based on character tagging is used.In order to solve the problem of insufficient semantic information of single character in this way,a method of multi-source embedding for character vectors based on external corpus pretraining is proposed.A hybrid model of bidirectional long-term and short-term memory network and conditional random field is constructed.By combining the two models,the shortcomings of each other are compensated.By designing several groups of comparative experiments,the performance of the model in the task of named entity recognition is deeply studied,and the effectiveness of each part of the hybrid model is gradually proved.(2)A named entity recognition model with character enhancement and attention mechanism is proposed.Aiming at the lack of internal spelling features in Chinese characters,a convolutional neural network is proposed to extract additional features of Chinese characters,and the design of convolutional neural network architecture is studied.On this basis,in order to utilize more feature information and solve the problem that the dimension of the final character vector is too large at the same time,an automatic combination method for vector based on attention mechanism is designed.This method does not lose feature information and does not increase additional computation.Finally,experiments show that the method achieves better results in the Chinese named entity recognition.The results show that this method achieves high recognition performance without additional domain-specific dictionaries and hand-made features.The overall F1 score reached 91.11%,which was superior to the traditional statistical methods and deep learning methods in relevant literature,and can be well applied to the Chinese named entity recognition task in today's big data background.
Keywords/Search Tags:chinese named entity recognition, deep learning, character feature, attention mechanism
PDF Full Text Request
Related items