A Research About The Influence Of The Distribution Feature On The Effectiveness Of Machine Learning Algorithms

Posted on:2012-07-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y H Wang

Full Text:PDF

GTID:2218330344950794

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

This thesis is a simple trial on depicting the reliance of machine learning algorithms on the distribution features of data set. It takes three research results as examples. The first example is about the decision tree algorithm. It is discovered that under the circumstances of the "same side splitting", some purity functions used will become ineffective and unable to be used to detect the "increasing of information" and thus cause the resulting decision tree to cease growing, while some other functions will not. In this thesis, detailed analysis is given on the reason of such failure and description of the category of the failing functions. The second example is about the support vector machine (SVM) algorithm, where the torsion by the kernel functions is pointed out. Kernel functions are mappings introduced for the situations where data sets are not linearly separable. Kernel functions maps from the original data space to a feature space, where the image of the data set may become linearly separable, hence the optimization methods valid for those linearly separable data sets can also be expected to be similarly effective for those data sets not linearly separable. The discovery of the kernel torsion, however, indicates that the meaning of the "distance comparison" in the original space and in the feature space may not be in consistence with each other. Those images with larger distances in the feature spaces do not necessarily map to data points with larger distances in the original spaces. The SVM algorithm together with kernel function is with a logical flaw. The third example is about a new way for the determination of the polynomial kernel parameters and a resulting instructive inequality. This inequality describes how the distribution features associates with the suitable dimensional numbers in the feature spaces. This thesis also describes the integrity style of the SVM algorithms.

Keywords/Search Tags:

machine learning, decision tree, purity function, support vector machine, kernel function, torsion, parameter selection, integrity style method

PDF Full Text Request

Related items

1	Research On Support Vector Machines And Kernel Methods
2	Contributions To Several Issues Of Machine Learning Method Based On Support Vector Machine And Fuzzy System
3	The Method Of Parameter Selection On Kernel Function Of SVM
4	Research On Kernel Function And Parameter Selection In Support Vector Machine And Its Application
5	Research On Some Variants Of Support Vector Machine
6	Research On Kernel Method For Support Vector Machine And Its Application In Forest Fire Video Recognition
7	Some Research On Support Vector Machine
8	Research On Model Selection For Support Vector Machine
9	Support Vector Machine And Its Application On Texture Classification
10	Research On Method And Application Of Fuzzy Support Vector Machine With Feature Selection