Research Of Multiattribute And Large-scale Data Classification Algorithm Based On Support Vector Machine

Posted on:2008-01-10

Degree:Master

Type:Thesis

Country:China

Candidate:T M Hou

Full Text:PDF

GTID:2178360272469419

Subject:Systems Engineering

Abstract/Summary:

Data mining aims at extracting novel and useful knowledge from large volumes of data. Classification is to predict the class label of data with supervisor obtained from experiential data, which is a basic problem in data mining. Support Vector Machine (SVM) has become the hotspot of machine learning because of its excellent learning performance. SVM also has successful applications in many fields. But as a new technique, SVM also has many shortcomings that need to be researched in data mining.Based on the basic concept of SVM theory and training algorithms, the SMO algorithm is discussed in this paper. This algorithm is efficiency for large-scale training set, but it still has some shortcomings, including slow training speed, large memory requirement, etc. In this paper, double SMO which is a improved SVM training algorithm is presented. This algorithm finds a approximate separating hyperplane on the sample data set by SMO algorithm, according to the approximate separating hyperplane, support vectors are collected, the separating hyperplane is achieved by SMO algorithm again. Using double SMO algorithm, the memory requirement is reduced, the impact of noise point is eliminated and the training speed is fastened.The data set in data mining is mostly multiattribute and large-scale, so before using double SMO algorithm, attribute reduction should be disposed. Accordingly calculation is reduced, the training speed is fastened and Classification mode is easy to understand. Therefore, attribute reduction is discussed for multiattribute issue in data mining and double SMO algorithm with attribute reduction is acquired. This algorithm is propitious to classification in data mining, and provides theory foundation to construct data mining plan.In this paper, a two-dimension data set is tested by double SMO algorithm, and a data mining plan is proposed by double SMO algorithm with attribute reduction. It is shown in the experiments that this algorithm improves the performance of SMO, the training speed is fastened, the memory requirement is reduced and this algorithm exceeds Decision Tree, Bayesian and Neural Network in accuracy. This thesis introduces SVM into data mining field, which affords a new choice when designing a data mining system.

Keywords/Search Tags:

data mining, support vector machine, training algorithm, sequential minimal optimization algorithm, attribute reduction

Related items

1	Research On Application Of Support Vector Machine To Chinese Medical Diagnosis
2	Research Of Data Mining Techniques Based On Support Vector Machines
3	An Improvement Sequential Minimal Optimization Algorithm Of Support Vector Machines
4	Support Vector Machine Training Algorithm And Its Improvement
5	Acceleration And Application Of Support Vector Machines
6	Research On The Method Of Data Normalization For Improving SVM Training Efficiency
7	The Research And Optimization On Support Vector Machines Algorithm
8	The Study Of Agricultural Data Classification Based On Support Vector Machine
9	The Improvement Research Of The Support Vector Machine Pretreatment On SMO Algorithm
10	Research On Support Vector Machine Leaning Algorithms