Font Size: a A A

Research On Chinese Named Entity Recognition With External Knowledge And Application In Medical Field

Posted on:2017-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2348330503987202Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The main task of Named Entity Recognition is finding out person names, place names, organization names and other entities, as a one of a basic task of Natural Language Processing field, named entity recognition is always one of the hot research points for decades. With the development of the machine learning method based on statistics, the recognition effect of the entities which appeared in the training corpus is very good, but the recognition of the non-landing words is still one of the difficult points of named entity recognition.To solve this problem, we first study the way to merge a lexicon into the traditional CRF mode, hope to make CRF model can identify the entities in the lexicon, experiments are carried out in the general domain using Wikipedia entries.After that, we noticed that in recent years, the rapid development of the depth of the neural network, which RNN and the improved RNN- LSTM has a very good performance in the field of Natural Language Processing. LSTM in theory can use all of the previous text information while training, and Bidirectional LSTM can use the information of the whole sequence.Then we use a Bidirectional LSTM named entity recognition model, the recognizer design is introduced, with many techniques like the dropout, transfer cost calculation, etc. According to the model we implement a named entity recognition tool using Python Theano. We use this tool to do a lot of experiments in the general field, proving that the Bidirectional LSTM model in the named entity recognition task is much better than the CRF model, in many groups of experiments to enhance the F-value of about 2%.In addition, we also use the depth neural network pre training techniques to add more external information in the Bidirectional LSTM model, the experiment shows that there is a certain effect.Finally, we use the CRF model and the LSTM model to test the data in the medical field. The CRF merging lexicon experiments was effective with identifying the entities in the lexicon; compared with the CRF model the effect of bidirectional LSTM model still have a promotion. Bidirectional LSTM model adding pre training vector with a not consistent corpus in the open field, although we lose several performance, but the effect of non-professional medical entities recognition is better.
Keywords/Search Tags:named entity recognition, external knowledge, conditional random fields, LSTM, medical text processing
PDF Full Text Request
Related items