Font Size: a A A

Data Mining Techniques Guided By Domain Knowledge And Its Application In Extracting Of Traditional Chinese Pharmacy

Posted on:2007-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:H M ZhuFull Text:PDF
GTID:1118360215497023Subject:Mechanical and electrical engineering
Abstract/Summary:PDF Full Text Request
Data Mining(DM) or Knowledge Discovery in Database(KDD) is the process of identifying valuable and potential knowledge from volumes of data. It is a better solution to the problem of"data blasting, Knowledge lacking". Domain knowledge plays an important role in the process of data mining and it can improve the efficiency and quality of data mining. It's an important but not yet well settled problem to utilize the domain knowledge in the whole process of data mining. In this paper, several research works was done for this problem as follows:1. The role of domain knowledge in each stage of data mining and the way of domain knowledge introduced in the mining systems were discussed. Relative works nowadays of data mining techniques guided by domain knowledge was introduced.2. A knowledge base for data mining was studied and implemented and it can be used to store and manage domain knowledge which can be guided the whole process of data mining. Seven types of domain knowledge for data mining were summarized, a method of two level knowledge organization, that is index knowledge and domain knowledge, was proposed. The function of knowledge edit,search and select was implemented and the input and show model were designed respectively for all kinds of domain knowledge. Syntax check of all kinds of domain knowledge was discussed. Many kinds of collision and redundancy forms of domain rule were studied emphatically and check algorithms were given. Because knowledge base for data mining bases on RDBMS, many management techniques of RDBMS can be used for knowledge management.3. Data preprocess techniques based on domain knowledge were studied. The classification of domain knowledge for data preprocess was analyzed. The structure of data preprocess system based on domain knowledge was introduced. Data cleaning techniques based on domain knowledge, including incomplete data, error data and duplicate data, were discussed respectively. Data transformation techniques based on domain knowledge, including data discretization and data generalization, were studied also.4. Data mining(refers to the key stage of KDD) techniques guided by domain knowledge were studied. A system of mining algorithm selection based on algorithm suiting knowledge was proposed and it was easy to be implemented. Selection methods of mining algorithm parameters were discussed, and the role of domain knowledge was analyzed. The query optimization based on domain knowledge was discussed and many kinds of optimization method were summarized. To the meta rule instantiation of meta-rule-guided data mining, the current methods to instantiate meta rule were discussed. A method based on connected attributes was proposed and it could greatly reduce the size of meta rule candidate instantiation set.5. Knowledge evaluation techniques guided by domain knowledge were studied. Common objective interestingness of rule was discussed. Unexpectedness and actionability are factors that contribute subjective interestingness. A method for evaluating unexpectedness of discovered rule was proposed, and it considered three types of unexpected forms and uncertainty of domain knowledge. Rule template was used to express domain knowledge on actionability of the discovered rule, and based on which, a method for evaluating actionability was proposed.6. A prototype system of data mining, named Miner2005, was implemented. It integrated these functions including knowledge base management, data source selection, data preprocess, data mining and knowledge evaluating. The system is characterized as guidance by domain knowledge and better flexibility.7. Data mining techniques were applied for optimization of extracting technology of traditional Chinese pharmacy. Knowledge for selecting extracting parameters was mined from past data of extracting technology of traditional Chinese pharmacy, and it could be used to direct technologist to select appropriate factors and factor levels of orthogonal test. Classifier taking extracting times as target attribute was constructed by decision tree ID3 and support vector classification algorithm. Support vector regression algorithm was applied to construct predict models for extracting time and the volume of the solvent respectively.
Keywords/Search Tags:Data Mining, Knowledge Discovery in Database, Domain Knowledge, Knowledge Base, Data Preprocess, Meta Rule, Knowledge Evaluation, Subjective Intrestingness, Traditional Chinese Pharmacy, Extracting Technology
PDF Full Text Request
Related items