Font Size: a A A

The Application And Improvement Of SVM Algorithm In Imbalanced Datasets

Posted on:2015-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:H H ZhuangFull Text:PDF
GTID:2308330461974885Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of computer network and continuous expansion of information, the level of computer hardware and software has achieved the unprecedented altitude. Nowadays people can monitor remote devices or access remote databases without leaving homes or offices. At the same time there is massive amount of valuable datasets in the real world and processing these data, such as credit cards fraud, searching cases, text classification and network intrusions, would bring great value. Traditional algorithm classification, which deals with typical and balanced sample data, can’t classify the minority class of the datasets and can’t be efficient in application. In order to solve the classification problem, Vpnik et al. established SVM algorithm which is widely used in many fields. However, SVM algorithm has some deficiencies. For example it favors minority support vector, which will lead to bad classification performance. According to its deficiencies, the improvements in SVM algorithm can generally be divided into two categories: refactoring the datasets and improving the algorithm. Based on the discussion on two categories of improvement, this paper is to design new algorithm to process imbalanced datasets and improve the accuracy of minority class of the datasets.First, this paper introduces the basic concepts, research background and evaluation criterion about imbalanced datasets and briefly explains the concepts, formulas and contents of machine learning and statistical learning. There is also general principle and concepts of SVM and discussions on the linear case and non-linear case in datasets.Second, this paper introduces two trends of improving SVM algorithm. The first is refactoring the sample. The paper proposes algorithm named newSMOTE, which are based on refactoring the sample. The algorithm of newSMOTE is combined with SVM to increase the predictability of the sample classification. The second trend is improved by introducing the algorithm of PSO. PSO-SVM is designed to increase the predictability of sample classification by combining the new algorithm of newSmote.Last, SVM algorithm is used to classify and predict in the performance reviews of primary and secondary school in Taijiang District. At first the data of teacher evaluation is transformed into feature vector. Then the data of teacher evaluation is predicted and marked after filling in the training model. The actual prediction of teacher evaluation verifies the validity of the improved algorithm.
Keywords/Search Tags:SVM, imbalanced datasets, teacher evaluation system, Libsvm
PDF Full Text Request
Related items