Font Size: a A A

Research Of Medical Records Semantic Retrieval Method Based On LDA And LSA

Posted on:2015-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ShiFull Text:PDF
GTID:2308330473953709Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, the explosive growth of medical records data is a great challenge to information retrieval technology. Currently, in the most used traditional retrieval models, often overlooked hidden text semantic structure, but the presence of synonyms, polysemy and other uncertainties exist in the medical records data, making difficult for users to quickly and accurately retrieve relevant information. Given the existence of medical records data characteristics and brought problems, this paper mainly discusses the LSA (Latent Semantic Analysis) and LDA (Latent Dirichlet Allocation) two semantic retrieval models for research.LSA and LDA model effectively overcomes the traditional retrieval model cannot deal with the issue of polysemy and synonyms, the latent semantic of text mining, according to the words, text, and pseudo text retrieval results in semantic association, achieve the goal of optimization search results. In this paper, the research on the basis of building the corpus in the field of medicine mainly includes the following several aspects:1. In view of the LSA model in the traditional calculation method of the TF-IDF weight based on linear processing still and do not considered the important influence problem of position information generated by their testimony appears, proposes and implements medical records the semantic retrieval algorithm based on LSA improvement model. LSA improved model when calculating the weights using nonlinear processing and location weighting factor, through truncated singular value decomposition to establish latent semantic space, and vocabulary and text projection in the space, and then extract the deep semantic relations between words. At the same time, this paper proposes a method of determin the optimal K value based on the precision. The experimental results show that the improved LSA model can effectively solve the problem of synonyms, improve the retrieval performance of medical records.2. According to the traditional retrieval algorithm is not well deal with the issue of large-scale medical records data, this paper use the LDA model to construst the theme model, using Gibbs sampling parameters reasoning, indirect computing model parameters, to obtain the text on the topic set probability distribution. At the same time, this paper proposes a kind of effective method to determine the optimal number of topics T. Finally, the experimental results data were analyzed to verify the feasibility of the LDA model for text semantic retrieval of medical records.3. Aiming at the LSA improved model using the singular value decomposition of computing time complexity is higher and is not suitable for processing the dynamic changes of text set and LDA model without considering the effects of key weight problem, proposes and implements medical records semantic retrieval algorithm based on the combined model. The experimental results show that the algorithm on the premise of guarantee the recall rate, it can be relatively improved medical records retrieval accuracy, verifying the rationality of the proposed.
Keywords/Search Tags:Medical Records, Information Retrieval, Latent Semantic Analysis model, Singular Value Decomposition, Latent Dirichlet Allocation model
PDF Full Text Request
Related items