Font Size: a A A

Joint Extraction Of TCM Knowledge As A Multi-head Selection Problem Based On Bert-wwm-ext And Loss Optimization

Posted on:2022-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:C TanFull Text:PDF
GTID:2518306536991659Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The structure of theoretical knowledge system of traditional Chinese medicine(TCM)is huge,and the entities and the relations between entities are intricate.It is necessary to use appropriate technical means to organize and store the knowledge in this field,so that it can be flexibly applied to daily life.Compared with the relational database,the knowledge graph is more suitable for the organization and storage of theoretical knowledge of TCM.Accurate and efficient entity relation extraction is an important foundation for constructing high quality knowledge graphs.Based on the self-built Co NER&RE-TCM corpus,this paper aims to form a high-performance multi-head selection joint extraction model,and carries out research and experimental work around the problems of the multi-head selection model such as weak character vector semantic representation and the weak ability to adapt to imbalanced relationships categories.Firstly,The unstructured and semi-structured text data of TCM were annotated with entities and relationships between entities,the Co NER&RE-TCM corpus is constructed for entity relation extraction.The entity relationship characteristics in the corpus are analyzed,laying a foundation for the research work on the joint extraction model of entity relationship in TCM.Secondly,in view of the weak character vector representation ability of the multi-head selection model,a method to enhance the semantic representation ability of Chinese text word vectors is proposed,and the corresponding module is constructed.In this module,the Bert-wwm-ext pre-training model is used as the embedding layer,and the Bi GRU network and the residual network are used as the hybrid coding layer to improve the character vector extraction ability of the model jointly.Thirdly,in view of the problem of imbalanced relationships categories in Chinese medical knowledge text data,a method for enhancing the adaptability of category imbalance based on loss optimization is proposed.This method adopts a self-setting threshold after the relationship classification layer to automatically classify the effective relationship samples into sufficiently classifiable relationships and insufficiently classifiable relationships.The loss of insufficiently classifiable relations is calculated subsequently.The model focus more on the learning of insufficiently classifiable relationships,to alleviate the influence of imbalanced of relationships categories on the precision and recall of relation extraction.Finally,under the Ubuntu 18.04 operating system,based on the Tensorflow framework,a multi-head selectiont TCM knowledge joint extraction model based on Bert-wwm-ext and loss optimization was constructed.The entity relation extraction experiments are carried out on the corpus Co NER&RE-TCM,and the experimental results are analyzed.
Keywords/Search Tags:joint extraction, bert-wwm-ext, loss optimization, corpus construction, TCM
PDF Full Text Request
Related items