Font Size: a A A

The Research Of Data Mining Based On Support Vector Machine

Posted on:2009-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:C S WangFull Text:PDF
GTID:2178360272957280Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a technology that finds underlying rules and extracts valuable knowledge.data mining aims at extracting novel and useful knowledge from large volumes of data.Support Vector Machine (SVM) is a new technology of Data Mining and a new implement recurred to optimization techniques to solve the problems of Machine Learning.It is a kind of new general learning machine based on statistical learning theory and has the advantages of global optimization, simple structure and high practicability.The traditional SVM is a supervised machine learning algorithm,which requires the label of the training samples is known.We only get a few labeled samples when SVM is applied to practical problems.In fact,a large number of samples are unlabeled.At this time the traditional SVM algorithm is so powerless to face such problems.In order to solve this problem, T.Joachims proposed the method of TSVM.Chen Yi-song and others improved TSVM and proposed PTSVM.This paper makes a further improvement for PTSVM,and SDSVM is proposed which is based on seperation degree. a semi-supervised classification algorithm based on the combination of the separation degree and support vector machine is devised, which uses the separation degree in Fisher criteria as metric and Fisher criteria as evaluation function. Try to make the algorithm get such a split plane which makes the same labeled samples' distance so close and the different labeled samples' so far at the end of training, to achieve the objective of improving classification accuracy. It reduces the number of training and the time complexity.The traditional SVM is only able to deal with binary classification.It can not deal with multiclass problems directly. In the real world,most of samples are multiclass datas.We need make a further expansion for traditional SVM so that it can deal with multiclass problems.This paper introduced some SVM algorithms which can deal with multiclass problems,such as one-a-rest,one-a-one,DAGSVM and based on decision tree SVM and Compared their performance. By analyzing the shortcomings of the SDSVM,we make a further improvement for it and successed in combining it with multiclass SVM. The results show that SDSVM gets a better performance in appling to semi-supervised classification problems than PTSVM.
Keywords/Search Tags:Data mining, Statistical Learning Theory, Support Vector Machine, Semi-Supervised Classification, MultiClass
PDF Full Text Request
Related items