Font Size: a A A

Knowledge discovery in databases: An attribute-oriented rough set approach

Posted on:1996-12-09Degree:Ph.DType:Thesis
University:The University of Regina (Canada)Candidate:Hu, XiaohuaFull Text:PDF
GTID:2468390014984962Subject:Information Science
Abstract/Summary:
Knowledge discovery systems face challenging problems from the real-world databases which tend to be very large, redundant, noisy and dynamic. In this thesis, we develop an attribute-oriented rough set approach for knowledge discovery in databases. The method adopts the artificial intelligent "learning from examples" paradigm combined with rough set theory and database operations. The learning procedure consists of two phases: data generalization and data reduction. In data generalization, our method generalizes the data by performing attribute-oriented concept tree ascension, thus some undesirable attributes are removed and a set of tuples may be generalized to the same generalized tuple. The goal of data reduction is to find a minimal subset of interesting attributes that have all the essential information of the generalized relation; thus the minimal subset of the attributes can be used rather than the entire attribute set of the generalized relation. By removing those attributes which are not important and/or essential, the rules generated are more concise and ellicacious.;Our method integrates a variety of knowledge discovery algorithms, such as DBChar for deriving characteristic rules. DBClass for classification rules. DBDeci for decision rules. DBMaxi for maximal generalized rules. DMBkbs for multiple sets of knowledge rules and DBTrend for data trend regularities, which permit a user to discover various kinds of relationships and regularities in the data. This integration inherit the advantages of the attribute-oriented induction model and rough set theory. Our method makes some contribution to the KDD. A generalized rough set model is formally defined with the ability to handle statistical information and also consider the importance of attributes and objects in the databases. Our method is able to identify the essential subset of nonredundant attributes (factors) that determine the discovery task, and can learn different kinds of knowledge rules efficiently from large databases with noisy data and in a dynamic environment and deal with databases with incomplete information. A prototype system DBROUGH was constructed under a Unix/C/Sybase environment. Our system implements a number of novel ideas. In our system, we use attribute-oriented induction rather than tuple-oriented induction, thus greatly improving the learning efficiency. By integrating rough set techniques into the learning procedure, the derived knowledge rules are particularly concise and pertinent, since only the relevant and/or important attributes (factors) to the learning task are considered. In our system, the combination of transition network and concept hierarchy provides a nice mechanism to handle dynamic characteristic of data in the databases. For applications with noisy data, our system can generate multiple sets of knowledge rules through a decision matrix to improve the learning accuracy. The experiments using the NSERC information system illustrate the promise of attribute-oriented rough set learning for knowledge discovery for databases. (Abstract shortened by UMI.)...
Keywords/Search Tags:Data, Knowledge discovery, Rough set, System, Knowledge rules, Information
Related items