Font Size: a A A

Research On Kernels For Structured Data

Posted on:2009-10-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:C H YinFull Text:PDF
GTID:1118360242989841Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Kernel function is an important idea in Machine Learning, especially in Support Vector Machine (SVM). Kernel function had been introduced into the context of machine learning before SVM was presented. However, the first successful application of kernel function is SVM. The development of kernel machine learning depends on the research of the combination of kernel function and SVM. This dissertation is based on the combination of kernel function and SVM, including constructions, implementations, and applications of the kernel function.In general, SVM takes a feature vector representation as its input, where some usual kernel functions such as polynomial kernel and RBF kernel are used. However, in the real world, many machine learning problems often work on special data types, such as strings and images. These special data contain some information of their structure, which might be useful in problem solving. We denote these special data by structured data. The usual kernel function may be not applicable for the problem involving structured data, because some information of these structured data may be lost in the mapping to feature vector. Therefore, many efforts were made to construct kernel functions for structured data and to compute these functions efficiently.This dissertation takes the kernel functions for string (denoted by string kernels) as research object, and presents some novel string kernels and algorithms for these kernels. Moreover, a bit-parallel technique is introduced to speed up the computation of fixed length string kernels. Finally, these string kernels are used in the context of intrusion detection.(1) The construction of new string kernels. String kernels can be divided into two kinds, one is subsequence-based and another is probability-based. The existing gap-weighted kernels and spectrum kernels belong to the former. The spectrum kernels did not consider the contribution of non-contiguous subsequences, and the gap-weighted kernels penalized the longer subsequences rather than encouraged them. By observing these disadvantages of existing string kernels in certain context, we presented a subsequence-based string kernel, which is called length-weighted kernel, in which the longer subsequence can contribute more to the kernel value. Moreover, a variant of length-weighted kernel is proposed, which is called length-weighted once kernel, in which all subsequences will be considered only once regardless of whether they occur only once or many times in a string. Other than subsequence-based string kernels, the probability-based string kernels take the dependence of characters occur in string into account. In order to consider the dependence of characters, we proposed a probability-based string kernel, which is based on Markov model and called Markov kernel.(2) The implementation of string kernels. Dynamic programming, suffix kernel, and suffix tree are three general approaches to compute the value of string kernels. After analyzing the idea of suffix kernels, we presented a series of algorithms based on suffix kernels to compute existing gapped kernels and the length-weighted kernel as well as its variant. Moreover, a bit-parallel technique is introduced to accelerate the computation of kernel functions. In order to compute the Markov kernel, a suffix tree is used to store the string and its matching statistics are used in the process of computation.(3) Intrusion detection is an important facet of information security. As a classification algorithm, SVM was used widely in the network-based intrusion detection. Other than network-based intrusion detection, most of the input data of host-based intrusion detection are command sequences or system call sequences. Therefore, the usual polynomial and RBF kernels are not applicable for host-based intrusion detection. In the host-based intrusion detection system, we construct a 1 -SVM classifier based on the training data by embedding string kernels, and detect intrusions in the testing data using this 1-SVM classifier. The experimental results reveal that the new string kernels outperform the existing string kernels in the host-based intrusion detection system.
Keywords/Search Tags:Support Vector Machine, Kernel function, Kernels for structured data, String kernels, Length-weighted kernels, Markov kernels, Intrusion Detection
PDF Full Text Request
Related items