| Traditional Chinese Medicine(TCM)is an ancient system of medicine that carries the experience and theoretical knowledge of the ancient Chinese people in their struggle against diseases,and is a medical theory system that has been gradually formed and developed through long-term medical practice.The diagnostic process of TCM can be broadly summarized into three steps: diagnosis,syndrome identification,and prescription formulation.First,the doctor looks,s listens,questions and feels the pulse of the patient carefully to understand the symptoms(Symptom),then summarizes the symptoms and identifies the syndromes(Syndrome),and finally selects the herbs(Herb)to form the prescription according to the function of the herbs to relieve the symptoms.In the process of diagnosis and formulation,the practitioner needs to consider the herbal action and the law of compatibility of herbs,which involves a large number of Chinese medicine entities.The whole process requires not only theoretical study,but also a lot of clinical practice and summary.This feature makes the prescription of TCM individually very prominent,hampering the development of new drugs discovering and teaching of the TCM.The long practice and inconsistent standards,which seriously hinders the development of TCM at home and abroad.In order to develop and promote TCM quickly and efficiently,it is important to discover herbal dispensing patterns and herbal prescription generation through machine learning,especially deep learning,which has emerged in recent years.Graph Neural Network(GNN)is suitable for discovering complex relational patterns and new patterns in heterogeneous data.In this study,HGCL(TCM data mining via Heterogeneous Graph Contrastive Learning)combines a heterogeneous graph learning approach and comparative learning to learn the representation of TCM entity nodes for TCM prescription generation and prediction.Heterogeneous Network(THIN)to represent complex data.The heterogeneous node representation of THIN is obtained by a graph neural network,trained by an innovative augmented negative sample heterogeneous loss function for comparative learning,and then by node vector distance for herb prediction and prescription generation.From the system perspective,after unsupervised training,the model takes the patient’s symptom set as input and outputs the symptoms and the prescribed herbs,which facilitates the development of an auxiliary model system that can automatically generate prescriptions by the patient user’s own input of their own symptoms.In addition to the flexibility of data structure,the use of graph network for Chinese medicine mining can also reduce the problem of low frequency and matrix sparsity of some herbs,and provide a reference for the development of new prescriptions.The HGCL model has been validated by the public datasets TCMRel and Ch P2015 and the lung cancer dataset Lu Ca provided by the collaborating hospitals,and has been improved compared with several similar baseline models,and has been published in the international conference Health Information Science,HIS2022. |