Font Size: a A A

Research Of Multiattribute And Large-scale Data Classification Algorithm Based On Support Vector Machine

Posted on:2008-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:T M HouFull Text:PDF
GTID:2178360272469419Subject:Systems Engineering
Abstract/Summary:PDF Full Text Request
Data mining aims at extracting novel and useful knowledge from large volumes of data. Classification is to predict the class label of data with supervisor obtained from experiential data, which is a basic problem in data mining. Support Vector Machine (SVM) has become the hotspot of machine learning because of its excellent learning performance. SVM also has successful applications in many fields. But as a new technique, SVM also has many shortcomings that need to be researched in data mining.Based on the basic concept of SVM theory and training algorithms, the SMO algorithm is discussed in this paper. This algorithm is efficiency for large-scale training set, but it still has some shortcomings, including slow training speed, large memory requirement, etc. In this paper, double SMO which is a improved SVM training algorithm is presented. This algorithm finds a approximate separating hyperplane on the sample data set by SMO algorithm, according to the approximate separating hyperplane, support vectors are collected, the separating hyperplane is achieved by SMO algorithm again. Using double SMO algorithm, the memory requirement is reduced, the impact of noise point is eliminated and the training speed is fastened.The data set in data mining is mostly multiattribute and large-scale, so before using double SMO algorithm, attribute reduction should be disposed. Accordingly calculation is reduced, the training speed is fastened and Classification mode is easy to understand. Therefore, attribute reduction is discussed for multiattribute issue in data mining and double SMO algorithm with attribute reduction is acquired. This algorithm is propitious to classification in data mining, and provides theory foundation to construct data mining plan.In this paper, a two-dimension data set is tested by double SMO algorithm, and a data mining plan is proposed by double SMO algorithm with attribute reduction. It is shown in the experiments that this algorithm improves the performance of SMO, the training speed is fastened, the memory requirement is reduced and this algorithm exceeds Decision Tree, Bayesian and Neural Network in accuracy. This thesis introduces SVM into data mining field, which affords a new choice when designing a data mining system.
Keywords/Search Tags:data mining, support vector machine, training algorithm, sequential minimal optimization algorithm, attribute reduction
PDF Full Text Request
Related items