Font Size: a A A

Chinese Named Entity Recognition Based On Bidirectional LSTM-CRF Model

Posted on:2020-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:H FangFull Text:PDF
GTID:2428330578952347Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Named entity recognition(NER)is a fundamental task in natural language processing,the task of which is to identify entities in the text to be processed with practical meaning,and is widely used in the fields of information extraction,machine translation,and automatic question and answer.The accuracy of named entity recognition has a profound impact on subsequent related work.Traditional named entity recognition methods require a large amount of manual annotation and specialized domain knowledge,but the current common LSTM-based approach can solve this problem in a suitable way.The method of named entity recognition based on BiLSTM(Bidirectional Long Short-Term Memory)networks can efficiently make use of contextual information,while CRF(Conditional Random Field)networks considers the order of output labels,but the accuracy of the single model is not very well,so we combine a BiLSTM network and a CRF network to form a BiLSTM-CRF model,thereby improving the accuracy of the named entity recognition.I further optimized the BiLSTM-CRF model based on the predecessors.In the prepro-cessing of the model,the single word is used as the input standard to obtain more contextual features of text information.In the model training,the pre-trained model is optimized by adjusting key parameters such as epoch,batch-size and dropout,and then a relatively opti-mized training model is obtained.Experiments show that this method can make up for the lack of training scale and improve the accuracy of named entity recognition to some extent.In this paper,the MSRA corpus and People's Daily corpus are used to test the BiLSTM-CRF model,and the named entities in the corpus are automatically identified through pattern learning style.We could compare to the result with the true training corpus,which had been manually labeled.The best F-value is 93.66%for the named entity recognition task in MSRA corpus,and the optimal F-value is 92.72%for the named entity recognition task in the People's Daily corpus.The experimental results show that the method combines the relationship between contextual information and tags order,which has achieved good results.
Keywords/Search Tags:Named entity recognition, Bidirectional long short-term memory, Condi-tional Random Field
PDF Full Text Request
Related items