| In the field of traditional Chinese medicine(TCM),there are a large number of ancient texts of traditional Chinese medicine.The content of the texts is extensive and profound,the language is incisive and unique,and the syntax is varied.Therefore,it is very difficult to extract the entity relationship from the ancient texts of traditional Chinese medicine.At present,most knowledge of TCM is stored in unstructured natural language texts,and researches on extraction of entity relations in TCM are very rare.In order to better integrate TCM knowledge,it is necessary to automatically obtain important text information and extract the relationship triad of TCM.The task of information extraction is born for this purpose.Entity recognition and relation extraction of text are two subtasks of information extraction.In the early stage,pipeline-based method is usually adopted to extract the relationship between supervised entities,namely entity recognition and relationship extraction as two independent tasks.This pipelined approach has obvious shortcomings,as propagation errors between two tasks tend to accumulate backwards,leading to errors in relation extraction.Therefore,to solve this problem,the researchers proposed using a joint model,using the underlying information between the two tasks.However,traditional joint models generally rely heavily on complex feature engineering.In addition,many models do not directly model the whole sentence,so they cannot solve the problem of overlapping relationships,that is,multiple relationships in the same sentence share a pair of entity or pairs of entities.To solve the above problems,the research contents of this topic are as follows:1)Aiming at the problem of relationship overlap in the joint extraction of triples of entity relations in TCM texts,a Hierarchical Binary Tagging Framework(HBT)based joint extraction method for traditional Chinese medicine text was proposed.This joint framework is compared with the traditional pipeline model and the Bi LSTM encoderdecoder-CNN joint model.Among them,the F1 value of the Bi LSTM encoder-decoder-CNN joint model is 1.3% higher than the traditional pipeline model.This joint framework fully combines the advantages of the pre-training language model and is superior to the other two in overlapping relation extraction.2)Due to the cost of TCM text data annotation is very expensive,so the application of pre-training text representation model in the deep mining of TCM data can reduce the cost of labeling and extract the feature information effectively.But due to the complexity of TCM semantics,sentences are short,in order to alleviate the problem of error propagation in joint extraction task,the BERT training model in word units is improved,feature extraction of traditional Chinese medicine is carried out by using the BERT-wwm model of whole word MASK technology,significantly improved the effect of TCM text joint entity relationship extraction task.Figure 33;Table 7;Reference 51... |