Font Size: a A A

Research On Knowledge Discovery Reliability In Traditional Chinese Medicine

Posted on:2009-08-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y FengFull Text:PDF
GTID:1118360302458538Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Reliability is a key issue in knowledge discovery. However, this important topic has not yet been well explored. The wide application of knowledge discovery technology nowadays poses a significant question for the community, that under which conditions the discovery is reliable, or alternatively we may ask under which conditions, the discovered knowledge is reliable. Most existing work on this topic considers knowledge discovery reliability (KDR) under the context of some specific data mining models. However, many common reliability issues still exist among different models, such as data quality, evaluation methods, etc. Thus, it is of great necessity to conduct a systematic research on these issues.Among various application areas of knowledge discovery, there is one field that particularly needs the consideration of KDR, that is, the area of Traditional Chinese Medicine (TCM). As a complete medical knowledge system taking an indispensable role in the health care for Chinese people for several thousand years, TCM has confronted with the great pressure of development in recent years. As a methodology that is capable to extract useful pattern from data, knowledge discovery is expected to exert its great power to promote the development of TCM. However, TCM data is known to have great natural language characteristics, with various expression patterns. Besides, the data quality in TCM is still unsatisfactory. Knowledge discovery on data with such features, requires more careful consideration on the issue of KDR.This thesis is a research focusing on KDR in TCM field. A systematic discussion of reliability issues in the whole life cycle of knowledge discovery is provided, as well as a process-based KDR framework named PBRF-KD. Subsequently, we emphasize three important types of KDR factors in TCM practice, i.e., the structural factors, the representational factors, and the trustworthiness-related factors. The major work and contributions of this thesis are as follows:First, we propose a process-based KDR framework named PBRF-KD. As a first framework to the study of KDR from the process perspective, PBRF-KD provides a uniformed view and effective approach for the analysis and estimation of KDR. As a model-independent framework, PBRF-KD could be applied by data analysts in various domains to assess the KDR. The six steps and seven main factors in PBRF-KD provide a traceable way in analyzing reliability of knowledge discovery, which can be viewed as an applicable blueprint for analyzing KDR in the whole knowledge discovery process.Second, we present key structural factors with regard to KDR in TCM, and propose a series of methods to optimize the structural factors. The data completeness is analyzed as the major structural factor in TCM. For the missing value in textual attribute in TCM data, we propose an imputation method based on an order-semisensitive similarity named M-Similarity. For the missing label in medical literature, we propose a multi-label text categorization approach based on M-Similarity.Third, we present key representational factors with regard to KDR in TCM, and propose a series of methods to optimize the representational factors. The major representational factors in TCM consist of representation granularity and representation consistency. For the issue of representation granularity, we propose a rule-based method of representation granularity subdivision. For the issue of representation consistency, we propose an ontology-based method to tackle representation inconsistency.Lastly, we present key trustworthiness-related factors with regard to KDR in TCM, and propose a series of methods to optimize the trustworthiness. For the data trustworthiness issue in TCM field, we propose a trustworthiness evaluation method based on literature historical acceptance, as well as a trustworthiness evaluation method based on popularity in Web. Using these two methods to generate weights in the mining of frequent pattern, we propose a weighed frequent pattern mining method based on data trustworthiness, and get meaningful results in 2 TCM formula datasets.
Keywords/Search Tags:Knowledge Discovery, Data Mining, Reliability, Knowledge Discovery Reliability, Knowledge Discovery in Traditional Chinese Medicine, Data Quality
PDF Full Text Request
Related items