Font Size: a A A

Data-Mining Methods Study And Its Application In Tranditional Chinese Prescription Compatibility Analysis

Posted on:2004-03-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:L LiFull Text:PDF
GTID:1118360125952980Subject:Traffic Information Engineering & Control
Abstract/Summary:PDF Full Text Request
Traditional Chinese Medicine (TCM) has a long history, and makes great contribution to the prosperity of Chinese nation. The Traditional Chinese Prescription (TCP) is an important part of TCM, about hundreds thousands of prescriptions are recorded in historical literature. Mining the compatibility of TCP data using modern information technology, especially the data mining technology, is an effective way for speeding the modernization of TCM and TCP. Data mining is a collection of scientific methods proposed for solving large practical problems of machine learning, pattern recognition, database technology etc. The purpose of data mining is to discover the implicit knowledge, and to help human expert to make decision.This thesis studies the methods of TCP data mining related to national project and analyses the compatibility of TCP by using these methods.Frequent itemset mining is an important data mining area. Some of studies adopt Apriori-like candidate set generation-and-test approach. However, candidate set generation is very time-consuming. FP-growth is an important frequent itemset mining algorithm that could generate frequent itemset without candidate set. Based on the analysis of the algorithm FP-growth, this paper proposes a new algorithm FP-growth which is much faster in speed, and also easy to realize. By adopting the modified data structure of FP-tree and header table, FP-growth generates FP-tree only once and generates header table in each recursive operation, The new algorithm get the same result of frequent itemset, but the performance study in computer shows that the speed of FP-growth* is at least two times as fast as that of FP-growth.Algorithm GRG (Graph based method for association Rules Generation) is proposed for association rules mining using the frequent closed itemsets groundwork. Frequent closed itemsets are subset of frequent itemsets, but they contain all information of frequent itemsets. The new algorithm constructs an association graph to represent the frequent relationship between items, and recursively generates frequent closed itemsets based on that graph. It also constructs a lattice graph of frequent closed itemsets and generates association rules base on lattice graph. It scans the database for only two times, and avoids candidate set generation. GRG shows good performance both in speed and scale up properties.A new algorithm PFP-growth (Parallel FP-growth), which is based on the FP-growth*, is proposed for parallel frequent itemset mining. The PFP-growth distributes the task fairly among the parallel processors. Partitioning strategies are devised at different stages of the mining process to achieve balance between processors and new data structures are adopted to reduce the information transportation between processor. The experiments on national high performance parallel computer show that the PFP-growth is an efficient parallel algorithm for mining frequent itemset.The SQL based rough set computation methods including equivalence classes and positive area computation are provided. The concept of relative and absolute of important evaluation method by rough set are proposed. The condition of absolute importance and its prove was given. The differences between important evaluation method that based on rough set and frequent statistics respectively are also discussed. The important medicine of chronic viral hepatitis type B (HBV) is analyzed based on the rough set importance evaluation method.The problems of data reduction, including relative reduction and absolute reduction are introduced and unified as the set operation on difference list that is come from the difference matrix. A heuristic reduction algorithm based on ant colony system was proposed.Finally, an introduction of the study of compatibility of TCP is given, including the history and characteristic of TCP, the pretreatment of TCP data, the construction of TCP database and the design of analyses system of TCP.
Keywords/Search Tags:Data mining, Frequent itemset, Frequent closed itemset, Association rules, Graph, Parallel algorithm, Rough set, Attribute importance, Knowledge reduction, Data pretreatment, TCP, Compatibility
PDF Full Text Request
Related items