Font Size: a A A

Research On Chinese Named Entity Recognition Algorithm Based On BiLSTM-CRF Model

Posted on:2022-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:P R ZhangFull Text:PDF
GTID:2518306605966329Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has achieved great success in computer vision,speech recognition and other fields.In addition,it has made a lot of progress in the field of natural language processing(NLP).In the research of named entity recognition(NER),a key basic task in NLP field,deep learning has also achieved good results.In today’s information technology era,a large amount of text information appears in front of people every day.As a task that can recognize key and effective information from unstructured text data,such as person name,place name,organization name and other proper names,NER meets the needs of people to quickly capture important information in the text.NER is an important basic tool in many application fields,such as relationship extraction,knowledge graph,intelligent query,question answering system,military command decision-making,auxiliary reasoning and so on.The effective research of NER will lay a solid foundation on the application of many fields,which is conducive to the follow-up research in these fields.In this thesis,deep learning algorithm is used to construct a bidirectional long-short term memory conditional random field(Bi LSTM-CRF)Chinese NER algorithm model based on the bidirectional long-short term memory(LSTM)model.The recognition targets are person name,place name,organization name and other proper names,At the same time,the NER model proposed in this thesis is integrated by using the algorithm.Finally,the NER model is successfully applied in practice.The work of this thesis is as follows.(1)The news text data of people’s daily in February 2020 is crawled.The unstructured news text crawled is took as the experimental data set after word segmentation,part of speech tagging,removing stop words and word tagging.The Chinese named entity recognition model of Bi LSTM-CRF is proposed.Compared with the baseline model Bi LSTM,the CRF layer can be added some feature constraints to ensure that the prediction results of Bi LSTM layer are effective.Experiments show that the F1 value of the news data set is 7.03% higher than that of the baseline model,and the best recognition result is obtained,that is,the NER effect is the best.(2)A multi-model fusion of majority voting(MMFMV)algorithm is proposed.The weighted voting algorithm is designed to calculate the weighted value of the base model.The advantages of several NER algorithm are combined.In this thesis,the base models of weighted voting are CRF,Bi LSTM and Bi LSTM-CRF respectively.Experiments show that the F1 value of MMFWV algorithm is 6.25% higher than that of multi-model fusion of majority voting(MMFMV)algorithm,and the entity recognition effect is better.(3)In order to make the Chinese NER technology closer to people’s life,save a lot of time for people,and create more dimensional value and application for the society.In this thesis,all the previously proposed NER models are displayed in a visualized interface of the software.Through the overall design,code writing,careful testing to build a Chinese NER software.The software adopts B/S(Browser/Server)architecture,including front-end display,back-end implementation,front-end and back-end interaction,which is divided into model training and testing,model usage and other modules.The software carries out NER on the unstructured text input by the user,and selects one of the NER models for entity recognition,so as to recognize the person name,place name,organization name and other proper names in the text to be recognized.
Keywords/Search Tags:Chinese Named Entity Recognition, News Text, BiLSTM-CRF, Weighted Voting, Software Implementation
PDF Full Text Request
Related items