Font Size: a A A

Research And Application Of Named Entity Recognition Based On Bidirectional LSTM

Posted on:2022-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuFull Text:PDF
GTID:2518306560991659Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Named Entity Recognition(NER)is one of the hot topics in natural language processing,which aims to identify named entities in texts and classify them into different entity types.Named entity recognition is the basic task of natural language processing,and its results can be effectively applied to information extraction,question answering system,text classification and other follow-up tasks.The accuracy of named entity recognition will directly affect the performance of subsequent work in natural language processing.With the development of deep learning,networks based on Long Short-term Memory(LSTM)and its variants have achieved good results in named entity recognition tasks,but named entity recognition still faces many challenges: in big data era,the number of named entities is constantly increasing and cannot be exhausted in the dictionary;the increase in the amount of data causes the problem of class imbalance to become more serious;the consumption of computing resources and memory resources based on deep learning methods is increasing;the naming of different fields Entities have their own regularities;the definition of words in the Chinese context is rather vague.This paper is directed to the above problems,the main tasks are as follows:(1)Traditional named entity recognition is weak in resolving the polysemy of a word,the learned word embedding cannot effectively represent the characteristics of the same word in different contexts,while the combined word embedding model based on deep learning will take up a lot of computing resources and memory resources.In response to these problems,we propose a new merged contextual word embedding model,by merging the word embeddings of the same word in different contexts,so as to retain the characteristics of words in different text contexts.At the same time,we introduced our customized data set of Chinese business names.The experimental results on the standard bi-directional long short-term memory network(Bi-LSTM)show that this method can effectively solve the existing problems and ensure the recognition accuracy whether it is on the public English data set or the custom Chinese data set.(2)In the training process of the named entity recognition model,the class imbalance of the training set seriously affects the overall results of the model,resulting in a higher accuracy rate of the model for the majority class and lower accuracy for the minority class.First,named entity replacement is used to increase the proportion of small samples in the data set,and then a loss function that focuses on positive samples is defined,and the overall impact of the imbalance-like problem model is reduced by two methods.The experimental results on the standard bi-directional long short-term memory network(Bi-LSTM)show that both methods can effectively alleviate the impact of class imbalance on training,whether it is on a public English data set or a custom Chinese data set,and effectively improve the recognition accuracy of small samples.
Keywords/Search Tags:Named entity recognition, Deep learning, Bidirectional long and short-term memory network, Word embedding, Class imbalance
PDF Full Text Request
Related items