Font Size: a A A

Data Mining In TCM Medical Records Based On Bayesian Network

Posted on:2009-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:C J LiFull Text:PDF
GTID:2178360272989859Subject:Computer applications
Abstract/Summary:PDF Full Text Request
As a kind of treasures in our country, Traditional Chinese Medicine (TCM) has a history of thousands of years. Herbalist doctors in each generation have accumulated a wealth of experience in long-term clinical practice. The summary of precious experience can not only enrich the TCM theory system, but also can promote the modernization of TCM. Therefore, the summary, exploitation and implementation of TCM data are a very important task in our country. However, it is difficult to be systematic and objective when using traditional methods such as masters' impartation and manual processing to process TCM data. So, it is necessary to use computer technology to exploit TCM data and find laws behind these data.As a widely and successfully applied technology of data mining, Bayesian Network also has successful application in the medical field. In this thesis, original data for research came from clinical cases of chronic gastritis and medical records of modern famous herbalist doctors. According to the problems when Bayesian Network dealt with these data, this thesis gave methods of data preprocessing and grouping dimension-reduction. The main work is as follows:(1) The characteristic and normalization of TCM medical records are discussed in this thesis. As to the problem that TCM medical records are hard to be quantified, this thesis gave a quantification method which use symptom decomposition. The proposed method gave a way to process the TCM data.(2) The feature of high symptom dimension and sparse data in chronic gastritis cases was analyzed. Because of this feature, the symptom dimension of chronic gastritis is hard to be reduced. To solve this problem, this thesis proposed and described a grouping dimension-reduction method which combines hierarchy clustering with principal component analysis (PCA). Finally, the proposed method is applied to boost the learning ability of Bayesian Network. The experiment results and some open problems were also discussed.(3) This thesis described data-missing problems in medical records of "internal generation of five pathogenic factors". Then, this thesis gave a comprehensive discussion of definition of missing data and impact which missing data caused on Bayesian Network learning. Following the introduction of many processing method of missing data, this thesis gave an improved SEM algorithm (E-SEM) based on Simulating Annealing and BC algorithm. Finally, the improved algorithm is used to train Bayesian Network and recognize the "five pathogenic factors" in the medical records. The performance of E-SEM algorithm was also discussed.
Keywords/Search Tags:Bayesian Network, Data Mining in TCM Data, Missing Data
PDF Full Text Request
Related items