Font Size: a A A

Analysis Model Of Medical Text And Image Based On LDA And LSA And Its Application

Posted on:2013-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiFull Text:PDF
GTID:1118330371482949Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
According to the data of medical text and image, semantic analysis technology can beused for analyzing the mathematical relationship between all kinds of datas through themodeling and statistics. This data analysis technique without subjective can provide anobjective basis of the diagnosis and the clinical information to doctors.Semantic modeling of data is the basis of the semantic analysis. At present, the LatentVariable Model and Tree Model are two major research directions in the field of semanticmodeling field at domestic and international. In view of the characteristics of the medicalinformation, both semantic modeling methods have their own advantages and disadvantages:(1) Latent variable model could better extract latent association of "concept, rules, andmodel" from the medical information set. Latent variable model is based on the idea of "bagof words", thus the modeling process ignores the structure, location and level of semanticelements, which are the semantic features of varying degree in aspects of medicalinformatics applications (such as retrieval, text generation, etc).(2) Tree model can reflect the semantics relation, the relative location or spatialdistribution of correlation between the semantic elements, such as Parse Tree, Context Tree,and so on. Tree model object is generally a simple probabilistic relationship or the literalsemantic, which lacks of information from the perspective of latent semantic analysis, thusmedical information couldn't be processed and used in a deeper level, such as auxiliarydiagnosis.Based on the above model research, for practical problems in the semantic analysis ofmedical information technology, the research from the semantic retrieval of medical text,semantic annotation of medical image and NLG of diagnostic text in this paper arementioned. Then the corresponding semantic modeling and methods are proposed. The thesisresearch content and innovative achievements obtained are as follows:(1) For semantic information processing in the medical text, LSA-tree model is formedby the fusion of LSA model and tree model. The LSA-tree model can extract both the literalsemantic and latent semantic from Free-Text Medical Records with semi-structure. Firstly,the text is segmented with semantic window which divides the words into several sub-trees.And the literal semantic parameters are calculated between core words and related words in sub-trees. Finally, the relation between core words is extracted through the LSA mapping inlatent semantic space. Experiment shows that, semantic retrival system for free-text medicalrecords based on LSA-tree model, not only simplifies the complexity of the original LSAmodel but also achieves the semantic disambiguation (synonymy) of medical words, toimprove the retrieval precision.(2) For semantic information processing of medical images, the LDA-tree method isused for the semantic annotation of X-ray coherent scattering image. According to the lessidentified features of the X-ray coherent scattering images, abstract ontology of the imageand image feature interference, first I proposed a tree structure-based segmentation methodthat the image is decomposed into regions and fragments which contain the image semanticfeatures (sub-graph). After the geometric properties, photometric features and topologicalproperties are extracted from the sub-graphs, the energy distribution curves and the topologyof the image information to the quantization coding. Furthermore, for crossing the semanticgap and realizing the semantic annotation of image, this paper introduces parameterestimation and variational inference process of LDA model, and uses a bag of visual wordsmodel to combine the image segmentation tree model and LDA model. Through theexperiment it can be proved that semantic annotation methods based on LDA-tree model canrealize the semantic annotation for X-ray coherent scatter images, and LDA-tree semanticannotation method is superior to the PLSA semantic annotation in the aspect of matchingaccuracy, besides, as to for X-ray coherent scatter imaging difference, noise and imagecharacteristics of mutual interference and other influencing factors have a better inhibition.(3) For medical text generation and auxiliary diagnosis, the LDA-LSA-tree model isbuilt for generating the medical image diagnosis. The basis of the analysis of textcharacteristics of medical imaging report, for dealing with the incomplete semanticinformation in LSA-tree model, we amendment the average distance to get the wordcontextual location information and add up the semantic information statistics of stop wordsin a literal semantic layer. And we propose a K-medoids Content Cluster analysis methodbased on LSA model to cluster and pre-weight value the medical image report text, and takethe content cluster of text to be the middle semantic layer of LSA-tree model. On the base ofresearch of natural language generation technology, we propose a LDA-LSA-tree methodused for natural language generation according to natural language generation systemstructure and the needs of semantic information in the process of generating the text, itmakes up the shortage of LSA-tree in semantic inference from subject content to mappingbetween the words, so it can fit the double needs of 'structure' and 'content definiteness' ofcontent planning modeling aspect of natural language generating system. We use 'association-weighting' method in the inference part, introduce term frequency-inversedocument frequency weighting method, realize the semantic complex weighting in Gibbssampling process using smooth LDA model. It proves that although the most commonkeywords matching model to generate text method is simple and easy to use, but thesemantic matching degree and readability of the text generated is very low, it can't providemore meaningful information for doctor's diagnosis. The NLG method based onLDA-LSA-tree in this paper is considered for the medical diagnosis report of semanticdetails sufficiently, so the generated result is also similar to the text of the artificial notation.Because the proposed LDA-LSA-tree model has good theme model performance, theaccuracy of the diagnosis is better than other semantic text generation model.The free-text records, diagnosis reports and other data used in this paper are obtainedfrom XX Tumor Hospitals, and X-ray coherent scattering imaging data is obtained fromSino-Japanese XX Hospital, each group data is audited by the medical specialists beforeusing. The experiment process is compared with several major and new medical informationprocessing method used in clinical applications, also the availability of the method andmodeling in this paper can be evaluated by medical specialists and analyzing the results bythe general standard comprehensive.
Keywords/Search Tags:Topic Model, Latent Semantic Analysis, Latent Dirichlet Allocation, Semantic TreeModel, X-ray Coherent Scattering Image
PDF Full Text Request
Related items