Research On Large-scale Mixed Data Classification With Kernel Methods

Posted on:2018-09-12

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S L Peng

Full Text:PDF

GTID:1368330596497260

Subject:Computer application technology

Abstract/Summary:

In practical application fields,there are a large number of mixed data which contain categorical and numeric attributes.It is a hot topic to design machine learning methods for extracting valuable knowledge in the large-scale mixed data.Kernel methods have solid theoretical foundation and are suitable for nonlinear machine learning.This dissertation focuses on the demand of large-scale mixed data classification,and discusses the kernel methods and optimization algorithms for this task.The main contributions are listed as follows.(1)A new support vector machine algorithm is proposed.It can extract the spatial structure information implied in categorical data.The difficulty of mixed data learning is how to handle categorical attributes as the spatial structure in categorical attributes is not clear.This dissertation presents a new numerical processing method for categorical attributes to extract the spatial structure information.The method maps categorical attributes into a real number space according to the performance criterion(such as generalization error).This mapping method improves the classification performance.According to the mapping strategies,we propose linear and nonlinear support vector machine algorithms,respectively.(2)A new kernel function is proposed for categorical data,and a multiple kernel learning algorithm for mixed data is designed with this function.According to the characteristics of categorical data,we propose a new ratio Gaussian kernel function for categorical attributes and a fast one-of-N coding method.Categorical and numerical attributes are processed separately.For categorical attributes,different kernel matrices are computed according to encoding methods.For numerical attributes,multiple kernel matrices are calculated according to the different kernel functions and hyper-parameters.The experimental results show that this method can improve the classification performance for mixed data.(3)Some fast optimization algorithms are proposed for linear support vector machine,nonlinear support vector machine and multiple kernel learning.A new discriminant criterion for support vector machine is proposed according to the KarushKuhn-Tucker conditions,and a stochastic sequential minimum optimization algorithm is designed.This algorithm can not only speed up the linear algorithm to deal with largescale data,but also keep the bias term of discriminant function in the optimization process.Moreover,we propose a new optimal step-size working set selection strategy.This strategy can reduce the number of iterations of nonlinear support vector machine and SMO-MKL algorithm,thus accelerate the training process.Our new algorithms not only extend the application fields of kernel method,but also provide a new idea to handle mixed data,heterogeneous data and multi-modal data.The mapping method can also be applied to heterogeneous data and multi-modal data learning,and we will examine these applications in a future study.

Keywords/Search Tags:

Mixed data, Categorical data mapping, Kernel method, Support vector machine, Working set selection

Related items

1	The Selection And Improvement Of Support Vector Machine Kernels
2	The Research And Application Of Wavelet Support Vector Machines In Data Modeling
3	Research On Support Vector Machine Solving The Large-scale Data Set
4	The Research On SVM Kernel Selection Based On The Characteristics Of Data Distribution
5	Research On Method And Application Of Fuzzy Support Vector Machine With Feature Selection
6	Study On Some Issues Of Kernel Machine Learning Method
7	Research On Algorithms Of HRR Target Recognition Based On Kernel Method
8	Research Of Data Analysis And Pre-selection Algorithm For Support Vector Machine Speech Recognition
9	Research On Selection Of Kernel Functions And Key Parameters In Support Vector Machine
10	Researches On Support Vector Machine Learning Approaches Based On Ensemble Learning