Font Size: a A A

Research On Large-scale Mixed Data Classification With Kernel Methods

Posted on:2018-09-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:S L PengFull Text:PDF
GTID:1368330596497260Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In practical application fields,there are a large number of mixed data which contain categorical and numeric attributes.It is a hot topic to design machine learning methods for extracting valuable knowledge in the large-scale mixed data.Kernel methods have solid theoretical foundation and are suitable for nonlinear machine learning.This dissertation focuses on the demand of large-scale mixed data classification,and discusses the kernel methods and optimization algorithms for this task.The main contributions are listed as follows.(1)A new support vector machine algorithm is proposed.It can extract the spatial structure information implied in categorical data.The difficulty of mixed data learning is how to handle categorical attributes as the spatial structure in categorical attributes is not clear.This dissertation presents a new numerical processing method for categorical attributes to extract the spatial structure information.The method maps categorical attributes into a real number space according to the performance criterion(such as generalization error).This mapping method improves the classification performance.According to the mapping strategies,we propose linear and nonlinear support vector machine algorithms,respectively.(2)A new kernel function is proposed for categorical data,and a multiple kernel learning algorithm for mixed data is designed with this function.According to the characteristics of categorical data,we propose a new ratio Gaussian kernel function for categorical attributes and a fast one-of-N coding method.Categorical and numerical attributes are processed separately.For categorical attributes,different kernel matrices are computed according to encoding methods.For numerical attributes,multiple kernel matrices are calculated according to the different kernel functions and hyper-parameters.The experimental results show that this method can improve the classification performance for mixed data.(3)Some fast optimization algorithms are proposed for linear support vector machine,nonlinear support vector machine and multiple kernel learning.A new discriminant criterion for support vector machine is proposed according to the KarushKuhn-Tucker conditions,and a stochastic sequential minimum optimization algorithm is designed.This algorithm can not only speed up the linear algorithm to deal with largescale data,but also keep the bias term of discriminant function in the optimization process.Moreover,we propose a new optimal step-size working set selection strategy.This strategy can reduce the number of iterations of nonlinear support vector machine and SMO-MKL algorithm,thus accelerate the training process.Our new algorithms not only extend the application fields of kernel method,but also provide a new idea to handle mixed data,heterogeneous data and multi-modal data.The mapping method can also be applied to heterogeneous data and multi-modal data learning,and we will examine these applications in a future study.
Keywords/Search Tags:Mixed data, Categorical data mapping, Kernel method, Support vector machine, Working set selection
PDF Full Text Request
Related items