Font Size: a A A

The Research Of Key-techniques In Knowledge Discovery System For TCM Pharmacology

Posted on:2007-09-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J HuFull Text:PDF
GTID:1118360218962624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional Chinese Medicine (TCM) has been used to cure diseases over thousands years in China. It is significant for people's health. There exist many difficulties in the research of TCM with traditional methods because of the complexity and imperfection of the TCM pharmacology. These prevent the succession and development of TCM.Data mining (DM) is a new computation technology. It fuses database, data warehouse, artificial intelligence, machine learning, artificial neural network, statistics, pattern recognition, information index, genetic algorithm and other field techniques. It is successful in mining practice to get useful information from large number of data.Supported by Grant of National Science Foundation of China (NO. 60473071, 90409007) and Grant from the State Administration of Traditional Chinese Medicine (NO. 2003JP40), we studied the mining techniques for properties, flavors, channel tropism, efficacy and other pharmacology information of TCM prescription from lots of prescriptions. These results can be used in researching of TCM.The main contributions include:1. The Searching Nearest Neighbor Theorem is proposed and proved. Based on the theorem SNN (Searching Nearest Neighbors) algorithm is proposed with time complexity O(n*log(n)) or O(n) if the data are gained by scanning image. All the data must be compared in the other searching nearest neighbor algorithms with time complexity O(n~2). However, only a few data are compared in SNN algorithm.2. Based on the idea that an object and the nearest neighbors are most probably in the same cluster, a clustering algorithm of NNAF to process multi-dimensional data with arbitrary shape is proposed, and its time complexity is O(n). In the case for threshold adjusted in the other clustering algorithm, the clustering procedure has to be performed again from begin to end. And the consumed time is nearly as many as the first time. However, when NNAF algorithm is performed and then the threshold is changed, the time can be saved more than 90% if user performs MLCA.3. Elitism Producing Strategy (EPS) in the initial population of Gene expression programming (GEP) is proposed, which can get higher fitness chromosome in the initial population. Thus the evolution can be start on higher level. The experiments show that the evolutionary efficiency can be increased by 17% using EPS.4. In order to produce excellent initial population of GEP, Gene Space Balance Strategy (GSBS) are proposed. The genes in the initial population of GEP produced by GSBS are diversified. The experiments show that the evolution efficiency can be increased by 20%.5. A criterion is proposed to quantitatively describe the gene diversity in population of GEP. In order to solve the problem of local optimization in the standard Gene expression programming, various population strategy (VPS-GEP) is proposed to make the evolution to skip from local optimization fast. The experiments show that VPS-GEP algorithm decreases the generations-stagnancy over 55%.6. The designation and implementation of the knowledge discovery system for TCM pharmacology are described. The algorithms proposed above are used in the system. In addition, the system architecture, database designation, and preprocess schemes of the system are described.
Keywords/Search Tags:data mining, knowledge discovery, cluster algorithm, GEP, TCM prescription
PDF Full Text Request
Related items