Font Size: a A A

A Method Of Multi-Relational Data Classification Of Continuous Attributes

Posted on:2010-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhangFull Text:PDF
GTID:2178360278966877Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is the most significant field of information technology, which is combined with many theory and technology such as database, artificial intelligence, machine learning and statistics. Classification of data is the most active and mature topic in data mining because it has been successfully used in business analyses. At present, there are many techniques for classification, such as decision tree induction, association rule, Bayesian network, neural network, rough sets and statistical model and so on. Among these techniques, classification method has been widely researched and applied for its high predicative accuracy and executive speed. What is more, it can be easily transformed into"if-then"rules of classification.Multi-Relational Data Mining (MRDM) is the multi-disciplinary field dealing with knowledge discovery from relational databases consisting of multiple tables. Mining data which consists of complex / structured objects also falls within the scope of the field, since the normalized representation of such objects in a relational database requires multiple tables. The field aims at integrating results from existing fields such as inductive logic programming, KDD, machine learning and relational databases; producing new techniques for mining multi-relational data. This paper can propose a new algorithm with the combination of kernel function and method of distance-based.So this paper proposed a classification algorithm of a multi-relational data which is continuous. In database, some data are discrete, such as names or address of ones; some data are continuous such as weight or stature, but this paper just discusses data which are continuous, if so, we can compute the distances of data with formula of Euclid. As a matter of a fact, this algorithm is another type of method which is based on distance, as a old saying: Birds of a feather flock together. Some data of the same kind maybe distribute same area. One of the simplest methods is to compute the distance of tuples, by way of this method we can assemble the same data with a model of sphere. How do we get this idea? It comes from the support vector machine in which a table with two dimensions is not only regarded as a table by this we can search the relation inconspicuously but also drew on Cartesian system of coordinates by this we can look at rule easily, and if the data are classified, you can take a single line to divide the different data, or, you should take curves to label them. So, to the data of relational database, we can take the attributes as some dimensions and take the tuples as dot of space of N dimensions. Now we can get a suitable sphere to hold some data of a same kind. But as complexity of space of N dimensions, we should give a tolerance to the data of a sphere. Only if is rate of error in a suitable scope, this paper should accept it, or we should compute them again. At last we can use the center of sphere and radius to present this kind of the data. At some special case, we can combine the some kind of sphere with different centers and radiuses. In the future we can taste of combining the support vector machine and kernel function to get better result.
Keywords/Search Tags:data mining, multi-relational data mining, distance, support vector, kernel function
PDF Full Text Request
Related items