Font Size: a A A

Research And Apply On Patient Record Text Mining Based On Latent Semantic Analysis

Posted on:2016-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z LiFull Text:PDF
GTID:2298330467479370Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The rapid development of medical information has brought great challenges to the medical data analysis, especially to the medical records. The intelligent process of medical records is very difficult according the loose structure of grammar in data and the un-uniformed terminology.Recently, latent semantic analysis has been widely studied and applied in the field of text mining. This technology can excavate not only the latent semantic contained in the text, but also the latent semantic behind the words. And furthermore, it can achieve acceptable results of text mining without getting the syntactic structure of the text. At the same time, latent semantic analysis can also provide interpretability, which other algorithms may not be able to provide. These characteristics make the latent semantic analysis very suitable for processing medical records, in order to assist doctors’ and researchers’ work.In this paper, we study the characteristics of the medical records collections and propose an improved latent semantic analysis model, which is named Latent Dirichlet Allocation model based on BM25weighted. Then, we verified the effectiveness of this new model in the collection of medical records, and completed the distributed training of it. After that, this paper analyzed the problem of automatic semantic annotation, and proposed a model for solving it. Meantime, this paper gives a summary for each medical record automatically.We study the Information Retrieval problem in medical records’text mining, and we divide this problem to three sub-problems. First, this paper explores the related item generation problem, automatically generated treatment items and drug items related to the symptoms based on semantic model with the semantic similarity as the evaluation criterion. Then we solve the match problem of similar medical records with the direct use of semantic model. Last, as we find that a short query text contains less semantic information, we solve the Ad-hoc retrieval problem with a method which mixes the language model and the semantic model. At last, we build a medical records data service system based on the research results talked above. This system provides retrieve service, records summary service and semantic summary service.
Keywords/Search Tags:medical records, Latent Semantic Analysis, Latent DirichletAllocation, semantic similarity, generation of related items, medical recordretrieval
PDF Full Text Request
Related items