Font Size: a A A

Rough Computation Models And Algorithms For Knowledge Discovery From Heterogenous Data

Posted on:2009-11-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q H HuFull Text:PDF
GTID:1118360278962039Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Machine learning and knowledge discovery is one of the most important issues to be addressed in artificial intelligence. And uncertainty and inconsistency are the key problems in knowledge discovery from complex data. Rough set theory, which simulates the capability of granulation and approximation in human cognition, has proven to be an effective mathematical tool to characterize incosistency in classification data. This theory has been applied in knowledge discovery from symbolic data. However, most of data sets in real-world applications are numerical, fuzzy or their mixture. Not much work has been devoted to discussing knowledge discovery from heterogeneous data with rough sets so far. It is proposed that there are six types of consistency in human's reasoning in this work. The mathematical models of these types consistency are built based on granulation and approximation in rough sets. Moreover, uniform model and algorithm are developed for knowledge discovery from heterogeneous data are developed. The main contributions of the work are listed as follows.First, Neighborhood rough set model and algorithms in general metric spaces are constructed. The objects described with numerical attributes can be considered as points in metric spaces. The neighborhoods of these points form a structure of granulation of the universe. Based on neighborhood granulation, a rough set model is developed for classification analysis in metric spaces. Neighborhood rough sets construct a framework for analyzing consistency of classification with numerical or symbolic features. If the size of neighborhood is looked as the granularity in data analysis, a multi-granularity data analysis tool is developed by varying the size of neighborhood. Algorithms for sample and feature reduction are constructed based on the neighborhood model.Second, a kernelized fuzzy rough model is developed for rough computation with heterogeneous data. The current researches on fuzzy rough sets are focused on construction of fuzzy rough approximation operators. Howerver little attention is paid to fuzzy granulation. It is found that a class of kernel functions can be used to compute the fuzzy T-equivalence relations between samples. Then these kernel functions can be used to build fuzzy granular structures for fuzzy rough sets. Based on this observation, a kernelized fuzzy rough set model is proposed for analyzing consistency in the fuzzy case. The connections between fuzzy dependency and ReliefF are shown. We introduce the idea in ReliefF to reduce the influence of noise in fuzzy rough sets based attribute reduction and we construct a generalized classification certainty measure.Third, a rough set model for fuzzy preference analysis is developed. Ordered classification is one class of learning tasks in decision modeling and multi-criterion analysis. Fuzzy preference relations, which are widely in multi-criterion analysis, are introduced and combined with general fuzzy rough set model, thus a fuzzy preference rough set model is proposed and algorithms for dependency analysis and attribute reduction are developed.Fourth, a general fuzzy rough set model is discussed to give a uniform definition of lower and upper approximations for all kinds of rough sets. Therefore, a uniform viewpoint for thereotical analysis and algorithm design is introduced. Moreover, based on the general model, we design a uniform information measure for Pawlak rough sets, neighborhood rough set, fuzzy rough sets and fuzzy preference rough sets.Fifth, the stability of attribute evlaution functions and attribute reduction algorithms proposed in this work is evaluated. It is found that Shannon entropy and fuzzy entropy are more robust than dependency and consistency, while neighborhood consistency and neighborhood dependency are the most instable.Sixth, a system is developed for rough set based knowledge discovery from heterogeneous data. Systematically comparative experiments are conducted. The results validate the effectiveness of the proposed techniques. Moreover, a multiple classifier system is designed by selectively combining a set of classifiers trained with rough set based reducts. In most cases, a set of reducts, rather than one reduct can be obtained from a decision system. Each reduct is a viewpoint to analyze the classification task. The information in different reducts is distinct and complement. Based on the theoretical results of classifier ensemble, a selective ensemble algorithm is developed based on a strategy of forward greedy selection and post-pruning. The experiments show the proposed algorithm can get a compact and effective classification system. This work develops a uniform rough set model for symbolic and numerical data analysis. Based on neighborhood rough sets and kernelized fuzzy rough sets, we develop a uniform model for classification learning from heterogeneous data. Then fuzzy preference rough sets show a uniform model for fuzzy preference learning with heterogeneous data. Finally, we construct a general rough set model and an information measure model for classification and preference learning.
Keywords/Search Tags:heterogeneous data, rough computation, classification, ranking, information measure
PDF Full Text Request
Related items