| Ancient books of traditional Chinese medicine are the crystallization of the wisdom of the Chinese nation for thousands of years,and they are one of the most precious treasures of Chinese traditional culture.However,there is no significant separation mark between the words and sentences of ancient Chinese books,and it is difficult to break sentences.There are many function words such as preposition conjunctions,complex and varied sentence semantics,and strong professionalism.These characteristics make the named entity recognition based on deep learning technology,such as fuzzy boundary recognition,partial single-word entity recognition error,lack of TCM domain data set and other problems.To solve the above problems,this thesis proposes a knowledge embedding based entity recognition algorithm for Chinese ancient books.This algorithm is based on BERT-BiLSTM algorithm,adding knowledge embedding layer to enhance the extraction of boundary information.In addition,it is further optimized and combined with the character-based named entity recognition algorithm to strengthen the optimization of single word entity tag recognition.The main research content of this thesis is as follows:(1)In view of the problem that there is no clear boundary between the words and sentences of ancient TCM books,which causes the difficulty in boundary identification,A Naming Entity Recognition of Ancient Chinese Books Based on Knowledge Embedding(NER-KE)is proposed.The algorithm uses the external knowledge base to obtain the candidate entities in the short text through the dictionary matching method,and splices the information of the candidate entities with the feature vectors of the input text sequences to achieve the embedding of word boundary information,and then strengthens the feature extraction and utilization of the entity boundary information.(2)Aiming at the problems of complex and varied meanings and difficult recognition of single word of ancient Chinese books,a joint decision based entity recognition algorithm of ancient Chinese books was proposed based on NER-KE,combining the advantages of character-based named entity recognition algorithm in extracting character-level features from single word entity recognition and strengthening label sequence constraints.On the basis of the knowledge embedding algorithm,the new single-word entity auxiliary recognition algorithm is added,the recognition results of the two calculations are made joint judgment,and then NER-KE results are corrected,so that the algorithm has a better performance on both multi-word entity and single-word entity,so as to improve the overall recognition accuracy of the algorithm.(3)An automatic indexing system for ancient Chinese books is designed and built on the basis of the knowledge embedded-based Chinese named entity recognition algorithm,which can effectively improve the working efficiency of indexing personnel. |