Font Size: a A A

Genetic Inductive Logic Programming Research

Posted on:2004-09-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W YangFull Text:PDF
GTID:1118360092992028Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining technique is one of the key research fields in the current computer technology. Most current researches of Data Mining are limited in the proposition logic framework, in which there exist disadvantages of weak representation power and not being able to utilize naturally background knowledge. Furthermore, most researches have used the "single-table assumption" and find out a pattern from data stored in single-table. But data is usually stored in multiple tables of relational database. If one wants to use these existing methods, he will have to overcome the problem of how to squeeze data into single-table.First-order rule mining technique based on first-order logic is often called as Inductive Logic Programming (ILP). First-order logic provides ILP with a uniform and very expressive means of representation: the background knowledge and the examples, as well as the mined rules, can all be represented as formulas in a clausal language. So we can use naturally background knowledge in the mining process. Furthermore, the mined knowledge is represented as first-order rules constructed by certain predicates, which is more powerful in description capacity than proposition rule. The representation allows for knowledge with more abundant intension and more easily understood. Therefore, ILP can overcome two key limitations existing in proposition rule mining methods: the limitation of description capability and the limitation of utilizing background knowledge. In addition, because there is internal relationship between the formalism of relational database, " relational algebra", and the clausal logic used by ILP, ILP methods can directly deal with mining task involved in multiple tables of relational database.First-order rule mining can be viewed as the search through the first-order rule space. Consider the hugeness and the extreme complexity of first-order rule space, in order to carry out an effective search, most ILP systems have used greedy search strategy and required to explicitly present strong language bias related to the mining task, the structure feature description of first-order rules explored in mining process, to reduce the search range or is used as heuristics to guide the search. The greedy search strategy may let search trapped in a local maximum. The additional language bias produces good search effect only when the target rules match it, which is not applicable for the fact that the prior knowledge about the structures of target rules is usually lack in Data Mining task.Genetic Algorithm (GA) simulating biologic evolution mechanism is stochastic-like search method. It guides the search direction according to probabilistic transition regularity. It utilizes the information acquired and accumulated in evolution process to organize search by itself and adaptively control the search direction. It needs little knowledge about search space or other auxiliary information, which fits the Data Mining task. GA adopts population search strategy, so it has strong global search performance and reduces the risk of trapping into local maximum.Hence, in order to enhance the robustness and adaptability of first-order rule mining technique and solve t he performance problem of first-order rule mining, ILP can adopt genetic algorithm as the search strategy.This dissertation concentrates on the initial research work in genetic inductive logic programming technique. Using genetic algorithm to mine the first-order rules depends on two factors: the "landscape" in genetic space and the "navigation" in genetic space. The two factors reflect respectively the static and dynamic features of algorithm. The " landscape" in genetic space describes the static feature of algorithm, which is associated with the triples: (1) The coding, which transforms first-order rule into the representation applicable to genetic operation. By the coding, all first-order rules needed to explore are mapped into the points in genetic space; (2) The fitness function, which evaluates the quality of first-order rule. The variati...
Keywords/Search Tags:Data Mining, Inductive Logic Programming, Genetic Algorithm, First-order Rule
PDF Full Text Request
Related items