Font Size: a A A

Research On Complex Chinese Named Entity Recognition Based On BiLSTM-CRF

Posted on:2020-08-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y GuFull Text:PDF
GTID:2428330575952504Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapid development of Internet social media has produced a huge amount of natural language data,which contains great value.Named entity recognition technol-ogy,as the most fundamental part of natural language processing,can excavate key entities from text data,and has considerable application value.However,some defi-ciencies remain in the current Chinese named entity recognition technology.Firstly,in the deep neural network model based on character-level embedding,which is popular today,sparse Chinese character vector space and insufficient information restrict the performance of the whole Chinese named entity recognition model.Secondly,some complex named entit problems such as nested named entities,long texts,and contex-tual errors negatively affect the performance of the model in application.Especially when we apply text data from Internet to practical projects,its poor standardization and confusing structures cause many challenges to named entity recognition.This pa-per conducts a series of studies on these issues,and the main conclusions include the following aspects.1)This paper researches into the development course of named entity recognition technology,and the advantages and disadvantages of commonly used models.Then the advantages of BiLSTM-CRF Model based on character-level embedding over other models in solving the problem of Chinese named entity recognition are demonstrated and the model has been implemented.2)In order to solve the problem of sparse Chinese character vector space and in-sufficient information in the deep neural network model based on character-level em-bedding,this paper proposes an improved method of Chinese character-level feature representation.In terms of spatial representation,this paper uses the features of Chi-nese word formation to optimize the representation of Chinese character vectors based on position information.In terus of information supplement,this paper uses the topic model framework to construct the probability vector of Chinese character topics as an auxiliary feature to supplement the global information in character-level feature repre-sentation.Experiments indicate that,compared to the simple character vectors trained by word2vec,the improved method of Chinese character-level feature representation polishes the performance of the BiLSTM-CRF Model based on character-level embed-ding.3)For the complex Chinese named entity problem,this paper analyzes the factors affecting the recognition effect of Chinese named entities in the application scenario,and proposes a cascaded deep neural network named entity recognition model.The low-level network of the model identifies coarse-grained named entities by modifica-tion of loss function and decoding method,and tries not to miss potential named entity information;the high-level network accepts the text segmented by the lower-layer net-work and joins the convolution layer and pooling layer to identify the named entity boundary more exactly.The model uses the final result of high-level network as the final output of the named entity recognition system.Experiments show that the model has achieved good results under both standard data sets and real network text data.
Keywords/Search Tags:Named Entity Recognition, Long Short Term Memory Network, Topic Model, Cascade Model
PDF Full Text Request
Related items