| In recent years,with the continuous improvement of medical information systems,the scale of medical data has shown the explosive growth.Clinical data such as medical image data,electronic health/medical records,medical knowledge graphs and so on have great research significance and value.In the era of medical big data,how to analyze and utilize medical big data for the research on patient similarity is one of great interest in the field of medical healthcare.As a key step of personalized precision medicine,the research on patient similarity aims at deriving clinically meaningful distance metrics to measure the similarities of patients according to their key clinical indicators.The medical concept,i.e.,medical entity,is a name or terminology appearing in medical data such as prescriptions,diseases,procedures,etc.The medical concepts implicit rich semantic relationships,and the interrelations between medical concepts are very complicated.How to better learn the distributed representations of medical entities is the key to obtain precise patient representations and improve the performance of patient similarity learning.We carry out research on both distributed representation learning of medical entities and patient similarity learning,and establish medical entity representation methods based on word embedding and joint embedding.Finally,we leverage the deep learning model based on Siamese-CNN for patient similarity learning.The research contents and main contributions are as follows:First,the medical concept is much relying on the temporal information,which indicates that the temporal information is essential for the distributed representation learning of medical entities.Furthermore,medical concepts related to chronic conditions to have larger temporal scopes while acute condition entities to have smaller temporal scopes.In response to this problem,we introduce the variable temporal context window to model the temporal scopes of medical concepts on the basis of original Skip-gram model,and then capture semantic information and temporal information when learning medical concept representations.In addition,in order to take the time-series information of medical concept sequences into account in the patient representation,we utilize a temporal patient representation constructed by stacking the medical concept embeddings orderly that appear in the patient medical records(patient medical concept sequences).Second,many studies only obtain the text feature representations of medical entities through text mining technology or through medical knowledge graph embedding to obtain the structural feature representations of medical entities,however,the influence of the mutual promotion between medical entity description and medical knowledge graph on the feature representations of medical entities is ignored.To solve this problem,we apply a joint embedding learning strategy to incorporate the contextual information of medical entity description while extracting the structural information of medical knowledge graph.Therefore,our proposed joint embedding model learns simultaneously from medical knowledge triples that have been directly observed in a given medical knowledge graph,and medical entity descriptions which have rich semantic information about these medical entities.Third,the temporal patient representation has the problem of dimensional inconsistency,and Siamese-CNN limits both the aspect ratio and the scale of input patient representations.In order to address the issue,we apply the spatial pyramid pooling strategy to Siamese-CNN,thereby extracting the fixed-size spatial feature from temporal patient representations of arbitrary size. |