Font Size: a A A

Research On Method And Mechanism Of KDD Based On Classification Model

Posted on:2005-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Q MengFull Text:PDF
GTID:1118360182468696Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
KDD, Knowledge Discovery in Database, is a novel technology of intelligent information processing for knowledge discovery in large database. First, independent of idiographic algorithms, mechanism of KDD based on classification model is studied with Granular Computation after systematically reviewing related work; and run-of-mill laws are drown out. Then the origin of some existing problems is analysed. Secondly, new theory and method of KDD are proposed with thorough researches on a lots calculation methods. As a result, PKDD(Personalized Knowledge Discovery in Database), problems of efficiency and accuracy, premature convergence of some KDD algorithms are researched and resolved. The primary work and contributions in this dissertation are as follows.All-Granular-Space, written as AllGS, and Super-Granular-Space, written as SGS, of information system are built; and then algebraic system, Boolean Algebra, and lattice are obtained. Taking advantage of the properties of lattice, structural model of AllGS, which is Super-Tree, written as Stree, is also obtained, whose notes denote granulations in AllGS. These works greatly richen the theoretic framework of geometry and algebra of KDD. And essential mathematical model and topological structure of information system come into being to a certain extent. With decision logic language, a new language which is called plus-basic language for describing granulation, written as L+base, is defined, and a model of "AHGS+ L+base" for KDD is proposed and studied. The relationship between L+base, AllGS and basic-concept-space is found, which is proved to be surjection from L+base to AllGS. And then origin of PKDD is analysed. In itself, KDD becomes a problem of finding the best expression consisted of granulations for every concept in Stree(or in AllGS). Obviously, the properties of AllGS, Stree, lattice, boolean algebra and so on will play important heuristic parts in the course. At last, the cause of premature convergence of some KDD algorithms is also analysed.An algorithm for finding personalized induct in decision system, which is written as DA-FPR, is put forward. In this algorithm, minimal-all-space, written as SP' , spacial-covering, (?)operation and x-operaton are established. Secondly, by using the two operation, (?) operation and x-operaton, alternately, superabundant condition attributes are gradually removed with user's bias, and SP' is ultimately converged to one-item-space. At the end, the expected reduct can be obtained from the one-item-space. Experiments prove that the size of SP' almost has no relation to the size of data set; the time-spending of DA-FPR spends mainly on construction of SP' ; after that, the later operations become simple ones on the set of condition attributes, and its time-spending almost can be negligible in contrast to the prior situation. So DA-FPR is a highly efficient algorithm, and DA-FPR is also proved to be theoretically convergent and complete.The two algorithms, DA-FPDR and PA-MRS are further proposed. The former throws off the redundant condition attributes and their values for every rule, and the latter deletes the whole rule which thegiven user is not interested in. Such, together with DA-FPR, the three algorithms lead to a effective method for finding personalized knowledge, which is PKDA. By way of deleting the superabundant condition attributes ring upon ring, the personzlized knowledge that the given user is interested in is eventually discovered. Obviously, these works contribute to researches on "discovery of interesting knowledge", which is currently a hotspot in area of KDD.Beginning with the analysis of origin of Evolutionary Computation(EC), characteristic of Culture Evolution mainly including human evolution is probed into, and Granular Evolution Computation, written as GEC, is first proposed, which consists of Population Evolution and Super-Population-Evolution. EC is a intelligent calculation method based on mechanism of Darwin's Evolution while GEC is anorther one, but based on mechanism of Culture Evolution, which uses Agent technology, Granular Computation, EC, and so on.In order to solving the problems of premature convergence and poor efficiency of KDD algorithms, with thorough analysis of current KDD algorithms based on EC, Granular Evolution Algoritm, written as GEA, is designd, which is based on GEC and Agent technology. This algorithm not only can decompose complex searching problems into several simple ones, but also can combine the strongpoints of Michigan method and Pittsburgh method. In that way, the algorithm can keep individual's multiformity in course of evolution. So probability of getting local optima is greatly reduced, and its ability to obtain global ones and its efficiency are improved. The analysis and experiments prove that its convergency and efficiency are superior to existing methods.To sum up, based on classification model and taking advantage of Granular Computation, the structural and algebraic properties of granular space are studied, and then mechanism of KDD is analyzed. The achievements are greatly richened theoretic framework of KDD. After the comprehensive overview of existing methods, some problems are studied, including "discovery of interesting knowledge", premature convergence, local optima, efficiency, accuracy, and related methods are put forward. In this dissertation, the proposed theorems and conclusions are all demonstrated; experiments results of the proposed algorithms are also analyzed and given to illustrate their validitiy. In the end, a prototype system of KDD, named MKDDS, is exploited by way of integrating all the proposed algorithms, which is simply introduced in appendix. The achievements in this dissertation have great applied value in the transformation of technological achievements to productive forces and realization of intelligent commerce, besides their theoretic and methodological significance in KDD.
Keywords/Search Tags:KDD, classification model, reduct, granular computation, decision tree, rough set, granular evolutionary computation, Agent
PDF Full Text Request
Related items