Font Size: a A A

The Research Of Multi-relational Classification Algorithm Based On ILP

Posted on:2010-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:L ChengFull Text:PDF
GTID:2178360278966796Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Multi-relational data mining is one of rapidly developing subfields of data mining. Multi-relational classification has been rapidly developing, and been widely applied in many aspects, such as financial decision making, medical research etc as an important data mining technology. Up to now, many classifications has developed including decision tree classification, classification based on rules etc. Inductive Logic Programming is the core method of relation learning that is widely studying and applying. Classification algorithms Based on Inductive Logic Programming include FOIL, TILDE, and CrossMine etc. Although these classification algorithms can deal with multi-relational data, the classifying precision, running efficiency and scalable with the increasing number of data and complexity of data mode are certain limited. Especially they cann't well do with the data set whose class distributions are imbalanced.The researching goal of this thesis is how to build a multi-relational classifying algorithm (classification) based on Inductive Logic Programming which could cope with multi-relational data of complex model in real world. The algorithm will have better classifying accurate, more efficient of running and better scalability with relation to the number of relations.Firstly, research Inductive Logic Programming and classical ILP classifying algorithm—FOIL.Tuple ID propagation in cross mining applies to multi-relational classification for existing problems of traditional ILP classification dealing with the large and complex databases in the real world, and in the most degree, it reduces space complication and time complication. Then, for training sampling sets which are dynamic distribution during algorithm are executing we use hybrid sampling technology to deal with the imbalanced sample class distribution so that accurately classify training sets, and improve the whole classification accurateness of algorithm and accurately classifying rare classes. Finally, according as certain criterion we prune classification rules so that they are brief and effective.For comparing and analyzing improved ILP multi-relational classifying algorithm with others, experimental data come from a synthetical database and a financial database widely applied to data mining field. Experimental results show that the improved ILP multi-relational classifying algorithm—the multi-classification algorithm based on hybrid sampling has better classifying accuracy and more running efficiency, also can more accurately deal with imbalanced datasets in multi-relational database. This task belongs to former task of theory and foundation in multi-relational data mining field. It has important theory meaning and practicality value.
Keywords/Search Tags:multi-relational data mining, classification, inductive logic programming, crossmine
PDF Full Text Request
Related items