Font Size: a A A

Tempo-semantic Similarity In Electronic Health Records Search

Posted on:2019-01-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:1318330542995343Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Electronic Health Records(EHRs)Search is a major component of infor-mation retrieval technology.With web search engines becoming mature,EHRs search has gained much progress.However,EHRs search is quite different from web search or literature search on one aspect.The task of EHRs search is to find similar patient with respect to a query.The target of EHRs retrieval is a collection of EHRs.The temporal order of the reports in the EHRs collection carries a lot of information about the temporal pattern of EHRs,which means the temporal dimension is very important.How to model temporal dimension and measure the temporal similarity is a crucial problem.The search models prevalent in recent years rank documents according to their relevance to the input query.However,due to the briefness and con-ciseness,a query does not contain any temporal information.Thus,temporal information has not been used sufficiently in current search models.Previ-ous study on the temporal information of EHRs mainly focus on the temporal recognition and extraction which represents the temporal information as tempo-ral expressions.Such methods suffer from the limited expressiveness of textual expressions.Thus,to find a method convenient for calculation to model and utilize the temporal information in EHRs search remains a challenge.To find a better way to utilize temporal information,this paper conducts a systematic study on the representation and quantification of temporal infor-mation in EHRs.We first analyze the importance of temporal order of med-ical terms in EHRs and the irregularity of the temporal distribution of EHRs.Upon which,we propose three methods for temporal information representation according to different application scenarios:static temporal distribution rep-resentation,dynamic vector-sequential representation and embedded temporal representation.Second,we propose three tempo-semantic similarity measures based on the representation frameworks respectively,from the perspectives of static/dynamic and explicit/embedded.Specifically,in the dynamic method,we first propose to treat a person' s EHRs as a temporal sequence,and model tem-poral and semantic dimension simultaneously in tempo-semantic vector space,which allows a dynamic matching between two sequences and similarity cal-culation.In the embedded method,we propose an adapted recurrent neural network to learn the underlined tempo-semantic pattern EHRs,and represent it with vectors.Finally,to incorporate the tempo-semantic similarity into the retrieval model,we propose a clustering-based method to combine it with the relevance between the query and document for re-ranking.The representation methods proposed in this paper make calculation of temporal similarity possible in EHRs search,based on which,three tempo-semantic similarity measurements are proposed.We conduct extensive ex-periments on the Text Retrieval Conference Medical Track dataset using our methods,classic retrieval models and the state-of-the-art methods participated in TREC.The results of the proposed method improve significantly compared with classic models,and surpass the state-of-the-art methods.
Keywords/Search Tags:Electronic Health Records Search, Tempo-semantic Similarity, Temporal Information Modeling, Temporal Vector Space Model, Embedding
PDF Full Text Request
Related items