| Background:In Natural Language Processing,Named Entity refers to the proper nouns,such as persons’ names,the names of organizations,and Named Entity Recognition(NER)is a technique for recognizing the proper nouns in texts and simultaneously classifying them according to their categories.In previous research,NER is usually used for recognizing,the persons’ names,the names of organizations,the names of locations,etc.which are common proper nouns.However,it is relatively less common that the NER is used for recognizing the names of Chinese Medicine Herbs(CMH)and the names of Chinese Medicine Formulae(CMF).In effect,this research direction can be significant.For instance,it can help the researchers in the Traditional Chinese Medicine(TCM)to complete literature reviewing more efficiently.Moreover,when it is in the initial step of the pipeline of the Information Extraction(IE),the recognized names of CMH and CMF can be used in the following steps so as to enrich the diversity of the extracted information.Nowadays,training neural network model is the most popular method to implement NER.Additionally,integrating the Chinese Word Segmentation(CWS)toolkit with keyword matching is also able to realize NER,but there is rare research that adopts this method to implement the NER for recognizing the names of CMH and CMF.Moreover,the performances of NER for recognizing the names of CMH and CMF implemented by these two methods have not been compared.Objectives:Implement the NER specially for recognizing the names of CMH and CMF by integrating the CWS toolkit with keyword matching and training neural network.Then,compare their performance to identify the advantages and the disadvantages of these two methods.Methods:Compare the performance the NER models realized in two methods though comparing the precision,recall and the F1-measure of recognizing the names of CMH and CMF in the same text from TCM textbooks.When implementing the NER for recognizing the names of CMH and CMF by integrating the CWS toolkit and the keyword matching,three popular off-the-shelf CWS toolkits,"Jieba" CWS toolkit,Tsinghua University THULAC CWS toolkit and PekingUniversity "pkuseg" CWS toolkit,are employed.Then,compare the performance of the NER implemented by integrating each of these toolkits with the same keyword dictionary to identify the best CWS toolkit for segmenting the names of CMH and CMF.After that,optimizing the relatively outstanding CWS toolkit to further improve its performance.Next,the NER for recognizing the names of CMH and CMF realized by integrating the optimized CWS toolkit and keyword matching will be used to represent the best performance of the NER that realized in this method can achieve.Next,train and optimize the Bi-directional Long Short-Term Memory(BLSTM)neural network and the Bi-directional Long Short-Term Memory-Conditional Random Field(BLSTM-CRF)neural network for recognizing the names of CMH and CMF.Then,select the better one to represent the maximum performance that the NER can achieve by neural network.Finally,comparing the performances of the NER realized in these two methods to show which method is more feasible to implement the NER for recognizing the names of CMH and CMF.Result and Conclusion:The "Jieba" CWS toolkit performs relatively better than the other two CWS toolkits when they are used to integrate with keyword matching to recognize the names of CMH and CMF.Additionally,with the same training data,the BSLTM-CRF neural network is better than the BLSTM neural network.Finally,the NER for recognizing the names of CMH and CMF implemented by integrating the "Jieba" CWS toolkit with keyword matching is better than the BLSTM-CRF neural network when they are processing the same text from TCM textbooks.Additionally,it is simpler to implement the NER for recognizing the names of CMH and CMF by integrating the CWS toolkits and keyword matching.Its performance,however,largely rests on the completeness of the keyword dictionary.Meanwhile,the NER realized in this method cannot address the ambiguity.By contrast,it is more complicated to implement the NER for recognizing the names of CMH and CNMF by training neural network.Whereas its performance does not rely on any dictionary and it is able to alleviate ambiguity. |