Font Size: a A A

Chinese Named Entity Recognition Based On Neural Network And Language Model

Posted on:2019-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhaoFull Text:PDF
GTID:2428330611493360Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the 21 st century,the textual information on the Internet has exploded,and in order to solve our concerns in such massive information,research on natural language processing has emerged.Natural language processing is a collective term for a range of tasks,including but not limited to machine translation,sentiment analysis,and so on.These advanced tasks are supported by a series of basic tasks,and Named Entity Recognition(NER)is one of them.Therefore,solving the problem of named entity recognition tasks directly affects the quality of subsequent tasks.Named entity identification refers to the identification of entities in the text,such as names of people,places,institutions,etc.After the entities are identified,they can be handed over to subsequent tasks for further processing such as entity disambiguation and entity links.Discovering these entities and accurately locating the boundaries of the entities is the task of naming entity recognition.The named entity recognition task originated in English,and there are spaces between English words and words,prefixes and suffixes.These common features make the task of identifying English named entities easier.As early as a few years ago,on the benchmark test set,the accuracy and recall rate of English named entity recognition has reached more than 90%.In contrast,Chinese named entity recognition is more difficult.First,Chinese entities usually need to perform word segmentation before recognition,and the effect of word segmentation directly affects the result of named entity recognition.Secondly,Chinese words are generally composed of only a few characters,which makes it difficult for Chinese to use CNN or LSTM to extract the character features of words.Therefore,the identification of named entities in Chinese is more difficult.This paper focuses on the analysis of the key issues of named entity recognition.On this basis,it discusses how to improve the Chinese named entity recognition technology and proposes a new named entity recognition model.The paper mainly includes the following work:(1)The word vector and character vector are pre-trained on the Chinese Giga-Word corpus using the word2 vec model.(2)A new named entity recognition model is proposed.This model takes as input the pre-trained word vector and character vector.The model uses two LSTMs to process the pre-trained word vector and character vector,respectively,and integrates the output of the processed word LSTM into the LSTM process of processing the character vector through the highway network layer.By taking into account all relevant words,the impact of word segmentation errors is reduced.(3)Joint training of named entity recognition and language model,transforming the acquired features into different semantic spaces through highway-net,avoiding mutual interference between tasks and improving NER results.Experiments were performed on multiple datasets and the results showed that the model achieved a level comparable to the current best results without using other external marker data or additional annotations.
Keywords/Search Tags:Named Entity Recognition, Bi-directional LSTM Networks, Conditional Random Field, Language model, Highway layer
PDF Full Text Request
Related items