Support vector machine is a kind of classification algorithm, which is proposed by Vapnik et al. It has been widely used in machine learning and data mining field because of the excellent generalization performance. It assumes that misclassifying samples belong to different classes would lead to the same cost in conventional classification algorithms. However, misclassifying samples belong to different classes would lead to different cost in many real applications, such as disease diagnosis, credit card fraud detection and others. To handle this kind of application, which is called cost sensitive problem, researchers proposed many cost sensitive algorithms. Cost sensitive support vector machine among those cost sensitive algorithms has excellent performance and widely adaptability. In this paper, we focus on the cost sensitive support vector machine. And the innovation research results are as follows.(1) For cost sensitive problem, we design a series of contrast experiments to compare multiple cost sensitive algorithms. In the experiments, we use fourteen datasets including ten cost sensitive datasets and four imbalanced datasets. To compare these algorithms, we evaluate these algorithms with four performance measure including total cost, AUC, F1-measure and G-mean, which is widely used to evaluate cost sensitive algorithms. Through the contrast experiments, we find out that cost sensitive support vector machine have better performance than other algorithms and can adapt to multiple datasets from different fields.(2) We proposed a batch fast algorithm for cost sensitive support vector machine. Similar to cost insensitive support vector machine, cost sensitive support vector machine try to solve a quadratic programming problem. Thus it can be solved with SMO algorithm. We first give the derivation process of SMO algorithm for cost sensitive support vector machine and analyze the time complexity. Then we propose an algorithm framework which can speed up the SMO algorithm using stochastic gradient descent method. After that, we design a contrast experiment to verify the effectiveness of the algorithm framework proposed. And with the analysis of several key indicators of algorithm, we can confirm the analysis of time complexity above.(3) In order to handle the online learning application, we propose multiple incremental fast algorithm for cost sensitive support vector machine. Batch algorithms cannot handle online learning problem well because they should re-train all the samples existed when some samples are added into the training dataset. However, incremental algorithms can absorb added samples and update existed model without re-train all the samples from scratch. We first derivate the multiple incremental cost sensitive support vector machine. Then we illustrate the effectiveness and efficiency of the algorithm through the experimental study. In addition, we also analyze the underlying reasons of the efficiency with further experiments. |