Font Size: a A A

Chinese Medical Case Data Mining Technology

Posted on:2010-07-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y B ZhangFull Text:PDF
GTID:2208360275498896Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The medical records of TCM(Traditional Chinese Medicine) experts are crystallization of famous herbalist doctors's experience, Data Mining(DM) can help us to get the clinical experience of the famous herbalist doctors and their medicine law. However, the medical records are usually in the form of unstructured data, in order to mine such data, Text Mining technology should be used to extract information from such so as to structuralize the medical records, which is the foundation for mining.In this thesis, Text Mining technology is researched first, which focuses on the Text Classification and Information Extraction. Then, these techniques are applied to structuralize medical records of famous herbalist doctors. Based on above structuralized medical records, some data mining methods are used to mine some clinic experience. Concrete research work is as follows:1. The study of Chinese text classification based on character feature. The techniques of Information Gain is applied to select features, cosine distance to measure the similarity between documents, and KNN methods as classifier, a systematic comparative experiments have been conducted on the news corpus from Fudan University, which achieves the 86.92% precision and 87% Macro-F score. The experimental results indicate that character based feature is an effective modeling method for Chinese text classification.2. The study of information extraction to extract the terms from clinical medical records. For structured medical records, it adopted the Meta-Bootstrapping algorithm to extract terms, meanwhile the pattern structure was designed for this purpose. The algorithm began with a few seed words provided artificially, after several iterations, term extraction can be accomplished, which featured no need of any shallow Chinese NLP techniques and labeled training corpus. The experiments are carried out on the 206 clinical medical records, the names of prescription, the dialectical information and the rules of treatment are extracted, F1 score achieved 64.29%, 56.21% and 76.64% respectively. On the basis of term extraction, unstructured medical records are converted into structured records.3. Based on medical records processed by text classification and information extraction, data preprocessing for Data Mining system of Traditional Chinese Medicine has been researched, which provide clean, structured data for the subsequent mining work.4. Based on the structured symptom information in medical records, a latent structure of syndrome differentiation of chronic gastritis has been researched. The improvement was made on current latent structure based on the factor analysis, which improved the accuracy of model and training speed.5. Based on structured prescriptions, the dose-effect relations of Chinese medicine has been mined. An agglomerative clustering algorithm based on weighted Euclidean Distance has been designed and implemented. The experiment on the Asthmatic Clinical Records of a famous herbalist doctor shows the essentials of his experience and has been well supported by the theory of Traditional Chinese Medicine.
Keywords/Search Tags:text mining, text classification, information extraction, Meta-Bootstrapping algorithm, EM algorithm, latent structure model, agglomerative clustering
PDF Full Text Request
Related items