Font Size: a A A

A Research About The Influence Of The Distribution Feature On The Effectiveness Of Machine Learning Algorithms

Posted on:2012-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WangFull Text:PDF
GTID:2218330344950794Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
This thesis is a simple trial on depicting the reliance of machine learning algorithms on the distribution features of data set. It takes three research results as examples. The first example is about the decision tree algorithm. It is discovered that under the circumstances of the "same side splitting", some purity functions used will become ineffective and unable to be used to detect the "increasing of information" and thus cause the resulting decision tree to cease growing, while some other functions will not. In this thesis, detailed analysis is given on the reason of such failure and description of the category of the failing functions. The second example is about the support vector machine (SVM) algorithm, where the torsion by the kernel functions is pointed out. Kernel functions are mappings introduced for the situations where data sets are not linearly separable. Kernel functions maps from the original data space to a feature space, where the image of the data set may become linearly separable, hence the optimization methods valid for those linearly separable data sets can also be expected to be similarly effective for those data sets not linearly separable. The discovery of the kernel torsion, however, indicates that the meaning of the "distance comparison" in the original space and in the feature space may not be in consistence with each other. Those images with larger distances in the feature spaces do not necessarily map to data points with larger distances in the original spaces. The SVM algorithm together with kernel function is with a logical flaw. The third example is about a new way for the determination of the polynomial kernel parameters and a resulting instructive inequality. This inequality describes how the distribution features associates with the suitable dimensional numbers in the feature spaces. This thesis also describes the integrity style of the SVM algorithms.
Keywords/Search Tags:machine learning, decision tree, purity function, support vector machine, kernel function, torsion, parameter selection, integrity style method
PDF Full Text Request
Related items